[PATCH] drm/amdkfd: fix TLB flush after unmap for GFX9.4.2
TLB flush after unmap accidentially was removed on gfx9.4.2. It is to add it back. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 42d40560cd30..a81ef232fdef 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1473,7 +1473,7 @@ static inline void kfd_flush_tlb(struct kfd_process_device *pdd, static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) { - return KFD_GC_VERSION(dev) > IP_VERSION(9, 4, 2) || + return KFD_GC_VERSION(dev) >= IP_VERSION(9, 4, 2) || (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && dev->sdma_fw_version >= 18) || KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); } -- 2.34.1
[PATCH] amd/amdkfd: remove unused parameter
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart(). Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +-- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 4fb32d86cd0e..0ef223c2affb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -320,7 +320,7 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem, void **kptr, uint64_t *size); void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem); -int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo); +int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo); int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info, struct dma_fence __rcu **ef); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index ef71b12062a1..bf8e6653341f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2189,13 +2189,12 @@ int amdgpu_amdkfd_gpuvm_sync_memory( /** * amdgpu_amdkfd_map_gtt_bo_to_gart - Map BO to GART and increment reference count - * @adev: Device to which allocated BO belongs * @bo: Buffer object to be mapped * * Before return, bo reference count is incremented. To release the reference and unpin/ * unmap the BO, call amdgpu_amdkfd_free_gtt_mem. */ -int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo) +int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo) { int ret; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 824e660283b2..f030cafc5a0a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -371,7 +371,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, goto err_wptr_map_gart; } - err = amdgpu_amdkfd_map_gtt_bo_to_gart(dev->adev, wptr_bo); + err = amdgpu_amdkfd_map_gtt_bo_to_gart(wptr_bo); if (err) { pr_err("Failed to map wptr bo to GART\n"); goto err_wptr_map_gart; -- 2.34.1
Re: [PATCH] drm/amdkfd: only flush mes process context if mes support is there
On 2023-12-13 22:19, Jonathan Kim wrote: Fix up on mes process context flush to prevent non-mes devices from spamming error messages or running into undefined behaviour during process termination. Fixes: 73204d028eb5 ("drm/amdkfd: fix mes set shader debugger process management") Signed-off-by: Jonathan Kim Reviewed-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index 8e55e78fce4e..43eff221eae5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -87,7 +87,8 @@ void kfd_process_dequeue_from_device(struct kfd_process_device *pdd) return; dev->dqm->ops.process_termination(dev->dqm, >qpd); - amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr); + if (dev->kfd->shared_resources.enable_mes) + amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr); pdd->already_dequeued = true; }
[PATCH] drm/amdkfd: fix NULL ptr for debugger mes flush on non-mes asics
The field adev->mes.funcs is NULL in function amdgpu_mes_flush_shader_debugger on non-mes asics, add mes enabling check for call this func to resolve the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index 8e55e78fce4e..43eff221eae5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -87,7 +87,8 @@ void kfd_process_dequeue_from_device(struct kfd_process_device *pdd) return; dev->dqm->ops.process_termination(dev->dqm, >qpd); - amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr); + if (dev->kfd->shared_resources.enable_mes) + amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr); pdd->already_dequeued = true; } -- 2.34.1
Re: [PATCH] drm/amdkfd: fix mes set shader debugger process management
On 2023-12-11 16:16, Jonathan Kim wrote: MES provides the driver a call to explicitly flush stale process memory within the MES to avoid a race condition that results in a fatal memory violation. When SET_SHADER_DEBUGGER is called, the driver passes a memory address that represents a process context address MES uses to keep track of future per-process calls. Normally, MES will purge its process context list when the last queue has been removed. The driver, however, can call SET_SHADER_DEBUGGER regardless of whether a queue has been added or not. If SET_SHADER_DEBUGGER has been called with no queues as the last call prior to process termination, the passed process context address will still reside within MES. On a new process call to SET_SHADER_DEBUGGER, the driver may end up passing an identical process context address value (based on per-process gpu memory address) to MES but is now pointing to a new allocated buffer object during KFD process creation. Since the MES is unaware of this, access of the passed address points to the stale object within MES and triggers a fatal memory violation. The solution is for KFD to explicitly flush the process context address from MES on process termination. Note that the flush call and the MES debugger calls use the same MES interface but are separated as KFD calls to avoid conflicting with each other. Signed-off-by: Jonathan Kim Tested-by: Alice Wong Reviewed-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 31 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 10 +++--- .../amd/amdkfd/kfd_process_queue_manager.c| 1 + drivers/gpu/drm/amd/include/mes_v11_api_def.h | 3 +- 4 files changed, 40 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index e544b823abf6..e98de23250dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -916,6 +916,11 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER; op_input.set_shader_debugger.process_context_addr = process_context_addr; op_input.set_shader_debugger.flags.u32all = flags; + + /* use amdgpu mes_flush_shader_debugger instead */ + if (op_input.set_shader_debugger.flags.process_ctx_flush) + return -EINVAL; + op_input.set_shader_debugger.spi_gdbg_per_vmid_cntl = spi_gdbg_per_vmid_cntl; memcpy(op_input.set_shader_debugger.tcp_watch_cntl, tcp_watch_cntl, sizeof(op_input.set_shader_debugger.tcp_watch_cntl)); @@ -935,6 +940,32 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, return r; } +int amdgpu_mes_flush_shader_debugger(struct amdgpu_device *adev, +uint64_t process_context_addr) +{ + struct mes_misc_op_input op_input = {0}; + int r; + + if (!adev->mes.funcs->misc_op) { + DRM_ERROR("mes flush shader debugger is not supported!\n"); + return -EINVAL; + } + + op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER; + op_input.set_shader_debugger.process_context_addr = process_context_addr; + op_input.set_shader_debugger.flags.process_ctx_flush = true; + + amdgpu_mes_lock(>mes); + + r = adev->mes.funcs->misc_op(>mes, _input); + if (r) + DRM_ERROR("failed to set_shader_debugger\n"); + + amdgpu_mes_unlock(>mes); + + return r; +} + static void amdgpu_mes_ring_to_queue_props(struct amdgpu_device *adev, struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index 894b9b133000..7d4f93fea937 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -296,9 +296,10 @@ struct mes_misc_op_input { uint64_t process_context_addr; union { struct { - uint64_t single_memop : 1; - uint64_t single_alu_op : 1; - uint64_t reserved: 30; + uint32_t single_memop : 1; + uint32_t single_alu_op : 1; + uint32_t reserved: 29; + uint32_t process_ctx_flush: 1; }; uint32_t u32all; } flags; @@ -374,7 +375,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, const uint32_t *tcp_watch_cntl, uint32_t flags, bool trap_en); - +int amdgpu_mes_f
Re: [PATCH] drm/amdkfd: Copy HW exception data to user event
On 2023-11-17 00:20, David Yat Sin wrote: Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not have valid data Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 0f58be65132f..7d3db017f8d7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -880,6 +880,10 @@ static int copy_signaled_event_data(uint32_t num_events, dst = [i].memory_exception_data; src = >memory_exception_data; size = sizeof(struct kfd_hsa_memory_exception_data); +} else if (event->type == KFD_EVENT_TYPE_HW_EXCEPTION) { +dst = [i].hw_exception_data; +src = >hw_exception_data; +size = sizeof(struct kfd_hsa_hw_exception_data); Please use tabs for indent instead of white spaces. Regards, Eric } else if (event->type == KFD_EVENT_TYPE_SIGNAL && waiter->event_age_enabled) { dst = [i].signal_event_data.last_event_age;
Re: [PATCH] drm/amdkfd: Fix a race condition of vram buffer unref in svm code
On 2023-09-26 23:00, Xiaogang.Chen wrote: From: Xiaogang Chen prange->svm_bo unref can happen in both mmu callback and a callback after migrate to system ram. Both are async call in different tasks. Sync svm_bo unref operation to avoid random "use-after-free". Signed-off-by: Xiaogang.Chen --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 70aa882636ab..8e246e848018 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -637,6 +637,15 @@ void svm_range_vram_node_free(struct svm_range *prange) { svm_range_bo_unref(prange->svm_bo); prange->ttm_res = NULL; Are above two lines not removed? Regards, Eric + /* serialize prange->svm_bo unref */ + mutex_lock(>lock); + /* prange->svm_bo has not been unref */ + if (prange->ttm_res) { + prange->ttm_res = NULL; + mutex_unlock(>lock); + svm_range_bo_unref(prange->svm_bo); + } else + mutex_unlock(>lock); } struct kfd_node *
Re: [PATCH] drm/amdkfd: fix add queue process context clear without runtime enable
On 2023-09-12 21:52, Jonathan Kim wrote: There are cases where HSA runtime is not enabled through the AMDKFD_IOC_RUNTIME_ENABLE call when adding queues and the MES ADD_QUEUE API should clear the MES process context instead of SET_SHADER_DEBUGGER. Such examples are legacy HSA runtime builds that do not support the current exception handling and running KFD tests. The only time ADD_QUEUE.skip_process_ctx_clear is required is for debugger use cases where a debugged process is always runtime enabled when adding a queue. Tested-by: Shikai Guo Signed-off-by: Jonathan Kim Reviewed-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 6d07a5dd2648..77159b03a422 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -227,8 +227,10 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, queue_input.tba_addr = qpd->tba_addr; queue_input.tma_addr = qpd->tma_addr; queue_input.trap_en = !kfd_dbg_has_cwsr_workaround(q->device); - queue_input.skip_process_ctx_clear = qpd->pqm->process->debug_trap_enabled || - kfd_dbg_has_ttmps_always_setup(q->device); + queue_input.skip_process_ctx_clear = + qpd->pqm->process->runtime_info.runtime_state == DEBUG_RUNTIME_STATE_ENABLED && + (qpd->pqm->process->debug_trap_enabled || + kfd_dbg_has_ttmps_always_setup(q->device)); queue_type = convert_to_mes_queue_type(q->properties.type); if (queue_type < 0) {
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
On 2023-08-11 09:26, Felix Kuehling wrote: Am 2023-08-10 um 18:27 schrieb Eric Huang: There is not UNMAP_QUEUES command sending for queue preemption because the queue is suspended and test is closed to the end. Function unmap_queue_cpsch will do nothing after that. How do you suspend queues without sending an UNMAP_QUEUES command? Now I understand what you mean, I was only thinking of UNMAP_QUEUES sending after clearing call. So MEC FW should clear the control register unconditionally on every UNMAP_QUEUES command. We can request it for gfx v9.4.3 to avoid the awkward workaround in KFD. Thanks, Eric Regards, Felix The workaround is new and only for gfx v9.4.2, because debugger tests has changed to check if all address watch points are correctly set, i.e. test A sets more than one watchpoint and leave, the following test B only sets one watchpoint, and test A's setting will cause more than one watchpoint event, so test B check out and report error on second or third watchpoint not set by itself. Regards, Eric On 2023-08-10 17:56, Felix Kuehling wrote: I think Jon is suggesting that the UNMAP_QUEUES command should clear the address watch registers. Requesting such a change from the the HWS team may take a long time. That said, when was this workaround implemented and reviewed? Did I review it as part of Jon's debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards, Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resume after clearing call, and the test ends. So there is not chance to send MAP_PROCESS to HWS. At this point FW has nothing to do. We have several test FWs from Tej, none of them works, so I recalled the kernel debug log and found out the problem. GFX11 has different scheduler, when calling clear address watch, KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't consider if the queue is suspended. So GFX11 doesn't have this issue. Regards, Eric On 2023-08-10 17:27, Kim, Jonathan wrote: [AMD Official Use Only - General] This is a strange solution because the MEC should set watch controls as non-valid automatically on queue preemption to avoid this kind of issue in the first place by design. MAP_PROCESS on resume will take whatever the driver requests. GFX11 has no issue with letting the HWS do this. Are we sure we're not working around some HWS bug? Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:03 PM To: Huang, JinHuiEric ; amd- g...@lists.freedesktop.org Cc: Kim, Jonathan Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
I will change title to "drm/amdkfd: workaround address watch clearing bug for gfx v9.4.2". is it OK? Regards, Eric On 2023-08-10 18:25, Kim, Jonathan wrote: [Public] Yeah this is a recent bug so this workaround is new. More rigorous tests revealed this is probably a miss on the FW side. We explicitly requested UNMAP_QUEUES unconditionally invalidate watch controls during the beginning of design to prevent any watch point racing. Note GFX11 MES calls are different on the surface but under the hood it's the same (registers get invalidated on unmap then get updated on map. Only difference it's at the queue level). I'm fine with this solution but I think it'd be good to describe this as a workaround somewhere (as opposed to a driver issue) so that folks aren't scratching their heads later on looking at code for GFX11 and up and wondering why we don't nuke the control setting with the KFD for those devices. Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:56 PM To: Huang, JinHuiEric ; Kim, Jonathan ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think Jon is suggesting that the UNMAP_QUEUES command should clear the address watch registers. Requesting such a change from the the HWS team may take a long time. That said, when was this workaround implemented and reviewed? Did I review it as part of Jon's debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards, Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resume after clearing call, and the test ends. So there is not chance to send MAP_PROCESS to HWS. At this point FW has nothing to do. We have several test FWs from Tej, none of them works, so I recalled the kernel debug log and found out the problem. GFX11 has different scheduler, when calling clear address watch, KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't consider if the queue is suspended. So GFX11 doesn't have this issue. Regards, Eric On 2023-08-10 17:27, Kim, Jonathan wrote: [AMD Official Use Only - General] This is a strange solution because the MEC should set watch controls as non-valid automatically on queue preemption to avoid this kind of issue in the first place by design. MAP_PROCESS on resume will take whatever the driver requests. GFX11 has no issue with letting the HWS do this. Are we sure we're not working around some HWS bug? Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:03 PM To: Huang, JinHuiEric ; amd- g...@lists.freedesktop.org Cc: Kim, Jonathan Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +- -- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
There is not UNMAP_QUEUES command sending for queue preemption because the queue is suspended and test is closed to the end. Function unmap_queue_cpsch will do nothing after that. The workaround is new and only for gfx v9.4.2, because debugger tests has changed to check if all address watch points are correctly set, i.e. test A sets more than one watchpoint and leave, the following test B only sets one watchpoint, and test A's setting will cause more than one watchpoint event, so test B check out and report error on second or third watchpoint not set by itself. Regards, Eric On 2023-08-10 17:56, Felix Kuehling wrote: I think Jon is suggesting that the UNMAP_QUEUES command should clear the address watch registers. Requesting such a change from the the HWS team may take a long time. That said, when was this workaround implemented and reviewed? Did I review it as part of Jon's debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards, Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resume after clearing call, and the test ends. So there is not chance to send MAP_PROCESS to HWS. At this point FW has nothing to do. We have several test FWs from Tej, none of them works, so I recalled the kernel debug log and found out the problem. GFX11 has different scheduler, when calling clear address watch, KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't consider if the queue is suspended. So GFX11 doesn't have this issue. Regards, Eric On 2023-08-10 17:27, Kim, Jonathan wrote: [AMD Official Use Only - General] This is a strange solution because the MEC should set watch controls as non-valid automatically on queue preemption to avoid this kind of issue in the first place by design. MAP_PROCESS on resume will take whatever the driver requests. GFX11 has no issue with letting the HWS do this. Are we sure we're not working around some HWS bug? Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:03 PM To: Huang, JinHuiEric ; amd- g...@lists.freedesktop.org Cc: Kim, Jonathan Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resume after clearing call, and the test ends. So there is not chance to send MAP_PROCESS to HWS. At this point FW has nothing to do. We have several test FWs from Tej, none of them works, so I recalled the kernel debug log and found out the problem. GFX11 has different scheduler, when calling clear address watch, KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't consider if the queue is suspended. So GFX11 doesn't have this issue. Regards, Eric On 2023-08-10 17:27, Kim, Jonathan wrote: [AMD Official Use Only - General] This is a strange solution because the MEC should set watch controls as non-valid automatically on queue preemption to avoid this kind of issue in the first place by design. MAP_PROCESS on resume will take whatever the driver requests. GFX11 has no issue with letting the HWS do this. Are we sure we're not working around some HWS bug? Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:03 PM To: Huang, JinHuiEric ; amd- g...@lists.freedesktop.org Cc: Kim, Jonathan Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
Yes. I will send out the fix for gc v9.4.3 later. Thanks for your review. Eric On 2023-08-10 17:02, Felix Kuehling wrote: I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
[PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, -- 2.34.1
Re: [PATCH] drm/amdkfd: fix and enable ttmp setup for gfx11
On 2023-07-24 15:01, Jonathan Kim wrote: The MES cached process context must be cleared on adding any queue for the first time. For proper debug support, the MES will clear it's cached process context on the first call to SET_SHADER_DEBUGGER. This allows TTMPs to be pesistently enabled in a safe manner. Signed-off-by: Jonathan Kim Reviewed-by: Eric Huang Regards, Eric --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 - drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 19 +-- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 11 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 12 +--- 6 files changed, 39 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c index 77ca5cbfb601..d67d003bada2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c @@ -637,7 +637,7 @@ static uint32_t kgd_gfx_v11_disable_debug_trap(struct amdgpu_device *adev, { uint32_t data = 0; - data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, keep_trap_enabled); + data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1); data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0); data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index e0f9cf6dd8fd..42df972357e9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -2755,6 +2755,16 @@ static int runtime_enable(struct kfd_process *p, uint64_t r_debug, if (pdd->qpd.queue_count) return -EEXIST; + + /* +* Setup TTMPs by default. +* Note that this call must remain here for MES ADD QUEUE to +* skip_process_ctx_clear unconditionally as the first call to +* SET_SHADER_DEBUGGER clears any stale process context data +* saved in MES. +*/ + if (pdd->dev->kfd->shared_resources.enable_mes) + kfd_dbg_set_mes_debug_mode(pdd, !kfd_dbg_has_cwsr_workaround(pdd->dev)); } p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED; @@ -2848,7 +2858,8 @@ static int runtime_disable(struct kfd_process *p) if (!pdd->dev->kfd->shared_resources.enable_mes) debug_refresh_runlist(pdd->dev->dqm); else - kfd_dbg_set_mes_debug_mode(pdd); + kfd_dbg_set_mes_debug_mode(pdd, + !kfd_dbg_has_cwsr_workaround(pdd->dev)); } } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 1f82caea59ba..9ec750666382 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -344,11 +344,10 @@ static int kfd_dbg_set_workaround(struct kfd_process *target, bool enable) return r; } -int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd) +int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd, bool sq_trap_en) { uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd->spi_dbg_launch_mode; uint32_t flags = pdd->process->dbg_flags; - bool sq_trap_en = !!spi_dbg_cntl || !kfd_dbg_has_cwsr_workaround(pdd->dev); if (!kfd_dbg_is_per_vmid_supported(pdd->dev)) return 0; @@ -432,7 +431,7 @@ int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, if (!pdd->dev->kfd->shared_resources.enable_mes) r = debug_map_and_unlock(pdd->dev->dqm); else - r = kfd_dbg_set_mes_debug_mode(pdd); + r = kfd_dbg_set_mes_debug_mode(pdd, true); kfd_dbg_clear_dev_watch_id(pdd, watch_id); @@ -474,7 +473,7 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, if (!pdd->dev->kfd->shared_resources.enable_mes) r = debug_map_and_unlock(pdd->dev->dqm); else - r = kfd_dbg_set_mes_debug_mode(pdd); + r = kfd_dbg_set_mes_debug_mode(pdd, true); /* HWS is broken so no point in HW rollback but release the watchpoint anyways */ if (r) @@ -516,7 +515,7 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target, uint32_t *flags) if (!pdd->dev->kfd->shared_resources.enable_mes) r = debug_refresh_runlist(pdd->dev->dqm);
[PATCH] drm/amdgpu: enable trap of each kfd vmid for gfx v9.4.3
To setup ttmp on as default for gfx v9.4.3 in IP hw init. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index 86a84a0970f0..9a90fd187909 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c @@ -898,6 +898,7 @@ static void gfx_v9_4_3_xcc_init_compute_vmid(struct amdgpu_device *adev, int i; uint32_t sh_mem_config; uint32_t sh_mem_bases; + uint32_t data; /* * Configure apertures: @@ -917,6 +918,11 @@ static void gfx_v9_4_3_xcc_init_compute_vmid(struct amdgpu_device *adev, /* CP and shaders */ WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regSH_MEM_CONFIG, sh_mem_config); WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regSH_MEM_BASES, sh_mem_bases); + + /* Enable trap for each kfd vmid. */ + data = RREG32_SOC15(GC, GET_INST(GC, xcc_id), regSPI_GDBG_PER_VMID_CNTL); + data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1); + WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regSPI_GDBG_PER_VMID_CNTL, data); } soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, xcc_id)); mutex_unlock(>srbm_mutex); -- 2.34.1
Re: [PATCH] drm/amdkfd: enable grace period for xcp instance
On 2023-07-11 14:38, Felix Kuehling wrote: On 2023-07-11 10:28, Eric Huang wrote: Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 --- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../drm/amd/amdkfd/kfd_packet_manager_v9.c | 8 --- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 31cac1fd0d58..9000c4b778fd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1619,10 +1619,14 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); - if (dqm->dev->kfd2kgd->get_iq_wait_times) + if (dqm->dev->kfd2kgd->get_iq_wait_times) { + u32 first_inst = dqm->dev->xcp->id * + dqm->dev->adev->gfx.num_xcc_per_xcp; dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, - ffs(dqm->dev->xcc_mask) - 1); + >wait_times[first_inst], + first_inst); + } + return 0; } @@ -1675,13 +1679,16 @@ static int start_cpsch(struct device_queue_manager *dqm) grace_period); if (retval) pr_err("Setting grace timeout failed\n"); - else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) + else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) { + u32 first_inst = dqm->dev->xcp->id * + dqm->dev->adev->gfx.num_xcc_per_xcp; /* Update dqm->wait_times maintained in software */ dqm->dev->kfd2kgd->build_grace_period_packet_info( - dqm->dev->adev, dqm->wait_times, + dqm->dev->adev, dqm->wait_times[first_inst], grace_period, _offset, - >wait_times, - ffs(dqm->dev->xcc_mask) - 1); + >wait_times[first_inst], + first_inst); + } } dqm_unlock(dqm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..45959c33b944 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_t current_logical_xcc_start; - uint32_t wait_times; + uint32_t wait_times[MAX_XCP]; Why do you need an array here, if it only saves the wait times in one of the array entries [first_inst]? That is my misunderstanding for XCP. Each DPM should be associated to 1 XCP. I thought DPM has multiple XCPs. Thanks, Eric Regards, Felix wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..960404a6379b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -292,17 +292,19 @@ static int pm_set_grace_period_v9(struct packet_manager *pm, struct pm4_mec_write_data_mmio *packet; uint32_t reg_offset = 0; uint32_t reg_data = 0; + uint32_t first_inst = pm->dqm->dev->xcp->id * + pm->dqm->dev->adev->gfx.num_xcc_per_xcp; pm->dqm->dev->kfd2kgd->build_grace_period_packet_info( pm->dqm->dev->adev, - pm->dqm->wait_times, + pm->dqm->wait_times[first_inst], grace_period, _offset, _data, - 0); + first_inst); if (grace_period == USE_DEFAULT_GRACE_PERIOD) - reg_data = pm->dqm->wait_times; + reg_data = pm->dqm->wait_times[first_inst]; packet = (struct pm4_mec_write_data_mmio *)buffer; memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio));
[PATCH] drm/amdkfd: enable grace period for xcp instance
Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 --- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 8 --- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 31cac1fd0d58..9000c4b778fd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1619,10 +1619,14 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); - if (dqm->dev->kfd2kgd->get_iq_wait_times) + if (dqm->dev->kfd2kgd->get_iq_wait_times) { + u32 first_inst = dqm->dev->xcp->id * +dqm->dev->adev->gfx.num_xcc_per_xcp; dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, - ffs(dqm->dev->xcc_mask) - 1); + >wait_times[first_inst], + first_inst); + } + return 0; } @@ -1675,13 +1679,16 @@ static int start_cpsch(struct device_queue_manager *dqm) grace_period); if (retval) pr_err("Setting grace timeout failed\n"); - else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) + else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) { + u32 first_inst = dqm->dev->xcp->id * +dqm->dev->adev->gfx.num_xcc_per_xcp; /* Update dqm->wait_times maintained in software */ dqm->dev->kfd2kgd->build_grace_period_packet_info( - dqm->dev->adev, dqm->wait_times, + dqm->dev->adev, dqm->wait_times[first_inst], grace_period, _offset, - >wait_times, - ffs(dqm->dev->xcc_mask) - 1); + >wait_times[first_inst], + first_inst); + } } dqm_unlock(dqm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..45959c33b944 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_twait_times; + uint32_twait_times[MAX_XCP]; wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..960404a6379b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -292,17 +292,19 @@ static int pm_set_grace_period_v9(struct packet_manager *pm, struct pm4_mec_write_data_mmio *packet; uint32_t reg_offset = 0; uint32_t reg_data = 0; + uint32_t first_inst = pm->dqm->dev->xcp->id * + pm->dqm->dev->adev->gfx.num_xcc_per_xcp; pm->dqm->dev->kfd2kgd->build_grace_period_packet_info( pm->dqm->dev->adev, - pm->dqm->wait_times, + pm->dqm->wait_times[first_inst], grace_period, _offset, _data, - 0); + first_inst); if (grace_period == USE_DEFAULT_GRACE_PERIOD) - reg_data = pm->dqm->wait_times; + reg_data = pm->dqm->wait_times[first_inst]; packet = (struct pm4_mec_write_data_mmio *)buffer; memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio)); -- 2.34.1
Re: [PATCH] drm/amdkfd: enable grace period for xcp instance
OK. Mukul, I will resend this patch based on top of yours. Regards, Eric On 2023-07-10 18:24, Joshi, Mukul wrote: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Eric Huang Sent: Monday, July 10, 2023 3:46 PM To: amd-gfx@lists.freedesktop.org Cc: Huang, JinHuiEric ; Kim, Jonathan Subject: [PATCH] drm/amdkfd: enable grace period for xcp instance Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Read/write grace period from/to first xcc instance of xcp in kfd node. Hi Eric, My patch, "drm/amdkfd: Update CWSR grace period for GFX9.4.3", which got missed during the merge should handle most of what you are trying to do. I will push that patch. Please add on top if there is anything missing. Hope that works for you. Thanks, Mukul Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c| 10 +++--- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index de83eccdd9de..a95bcb91dc09 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1619,10 +1619,15 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); - if (dqm->dev->kfd2kgd->get_iq_wait_times) + if (dqm->dev->kfd2kgd->get_iq_wait_times) { + u32 inst = ffs(dqm->dev->xcc_mask & + (1UL << + dqm->dev->xcp->id * + dqm->dev->adev->gfx.num_xcc_per_xcp)) - + 1; dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, - 0); + >wait_times[inst], + inst); + } return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..45959c33b944 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_twait_times; + uint32_twait_times[MAX_XCP]; wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..dd50164c16cd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -292,17 +292,21 @@ static int pm_set_grace_period_v9(struct packet_manager *pm, struct pm4_mec_write_data_mmio *packet; uint32_t reg_offset = 0; uint32_t reg_data = 0; + uint32_t inst = ffs(pm->dqm->dev->xcc_mask & + (1UL << + pm->dqm->dev->xcp->id * + pm->dqm->dev->adev->gfx.num_xcc_per_xcp)) - + 1; pm->dqm->dev->kfd2kgd->build_grace_period_packet_info( pm->dqm->dev->adev, - pm->dqm->wait_times, + pm->dqm->wait_times[inst], grace_period, _offset, _data, - 0); + inst); if (grace_period == USE_DEFAULT_GRACE_PERIOD) - reg_data = pm->dqm->wait_times; + reg_data = pm->dqm->wait_times[inst]; packet = (struct pm4_mec_write_data_mmio *)buffer; memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio)); -- 2.34.1
[PATCH] drm/amdkfd: enable grace period for xcp instance
Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c| 10 +++--- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index de83eccdd9de..a95bcb91dc09 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1619,10 +1619,15 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); - if (dqm->dev->kfd2kgd->get_iq_wait_times) + if (dqm->dev->kfd2kgd->get_iq_wait_times) { + u32 inst = ffs(dqm->dev->xcc_mask & + (1UL << + dqm->dev->xcp->id * + dqm->dev->adev->gfx.num_xcc_per_xcp)) - 1; dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, - 0); + >wait_times[inst], + inst); + } return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..45959c33b944 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_twait_times; + uint32_twait_times[MAX_XCP]; wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..dd50164c16cd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -292,17 +292,21 @@ static int pm_set_grace_period_v9(struct packet_manager *pm, struct pm4_mec_write_data_mmio *packet; uint32_t reg_offset = 0; uint32_t reg_data = 0; + uint32_t inst = ffs(pm->dqm->dev->xcc_mask & + (1UL << + pm->dqm->dev->xcp->id * + pm->dqm->dev->adev->gfx.num_xcc_per_xcp)) - 1; pm->dqm->dev->kfd2kgd->build_grace_period_packet_info( pm->dqm->dev->adev, - pm->dqm->wait_times, + pm->dqm->wait_times[inst], grace_period, _offset, _data, - 0); + inst); if (grace_period == USE_DEFAULT_GRACE_PERIOD) - reg_data = pm->dqm->wait_times; + reg_data = pm->dqm->wait_times[inst]; packet = (struct pm4_mec_write_data_mmio *)buffer; memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio)); -- 2.34.1
Re: [PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3
Thanks for your review. The prefix name change will be contradictory that new functions prefix name is different with existing functions prefix name. Are you sure it doesn't matter? Regards, Eric On 2023-07-07 19:52, Kim, Jonathan wrote: I would change the static prefix names from kgd_gfx_ to kgd_gc_ to match file name and specify it as the target GC version. With that fixed and assuming grace period instance fix ups will follow after, this patch and series is: Reviewed-by: Jonathan Kim *From:* Huang, JinHuiEric *Sent:* Friday, July 7, 2023 1:46 PM *To:* amd-gfx@lists.freedesktop.org *Cc:* Kim, Jonathan ; Kim, Jonathan ; Huang, JinHuiEric *Subject:* [PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3 From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 27 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 166 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h | 6 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c | 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 +- .../gpu/drm/amd/include/kgd_kfd_interface.h | 3 +- 10 files changed, 213 insertions(+), 12 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index 60f9e027fb66..a06a99c5d311 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -23,6 +23,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_arcturus.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_2_offset.h" #include "gc/gc_9_4_2_sh_mask.h" #include @@ -36,7 +37,7 @@ * initialize the debug mode registers after it has disabled GFX off during the * debug session. */ -static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { @@ -107,7 +108,7 @@ static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device return data; } -static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, uint8_t wave_launch_mode, uint32_t vmid) { @@ -125,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst ) { uint32_t watch_address_high; uint32_t watch_address_low; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h new file mode 100644 index ..a7bdaf8d82dd --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h @@ -0,0 +1,27 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR
[PATCH 3/4] drm/amdkfd: enable watch points globally for gfx943
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 24083db44724..190b03efe5ff 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -446,7 +446,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, uint32_t *watch_id, uint32_t watch_mode) { - int r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + uint32_t xcc_mask = pdd->dev->xcc_mask; if (r) return r; @@ -460,14 +461,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, } amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( + for_each_inst(xcc_id, xcc_mask) + pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( pdd->dev->adev, watch_address, watch_address_mask, *watch_id, watch_mode, pdd->dev->vm_info.last_vmid_kfd, - 0); + xcc_id); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); if (!pdd->dev->kfd->shared_resources.enable_mes) -- 2.34.1
[PATCH 4/4] drm/amdkfd: add multi-process debugging support for GC v9.4.3
From: Jonathan Kim Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended MAP_PROCESS packet to support multi-process debugging. Update the mutli-process debug support list so that the KFD updates the runlist on debug mode setting and that it allocates enough GTT memory during KFD device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h index a289e59ceb79..a0afc6a7b6c4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h @@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p, static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev) { - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || - KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0); + return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || + KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0)); } void debug_event_write_work_handler(struct work_struct *work); -- 2.34.1
[PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 27 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 166 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 3 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 6 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 3 +- .../gpu/drm/amd/include/kgd_kfd_interface.h | 3 +- 10 files changed, 213 insertions(+), 12 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index 60f9e027fb66..a06a99c5d311 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -23,6 +23,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_arcturus.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_2_offset.h" #include "gc/gc_9_4_2_sh_mask.h" #include @@ -36,7 +37,7 @@ * initialize the debug mode registers after it has disabled GFX off during the * debug session. */ -static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { @@ -107,7 +108,7 @@ static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device return data; } -static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, uint8_t wave_launch_mode, uint32_t vmid) { @@ -125,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst ) { uint32_t watch_address_high; uint32_t watch_address_low; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h new file mode 100644 index ..a7bdaf8d82dd --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h @@ -0,0 +1,27 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, + bool restore_dbg_registers, + uint32_t vmid); +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, + uint8_t wave_launch_mode, + uint32_t vmid); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 5b4b7f8b92a5..543405a28b19 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -22,6 +22,7 @@ #include "amdgpu.h" #include "amdgpu_amdkfd.h" #include "
[PATCH 2/4] drm/amdkfd: restore debugger additional info for gfx v9_4_3
From: Jonathan Kim The additional information that the KFD reports to the debugger was destroyed when the following commit was merged: "drm/amdkfd: convert switches to IP version checking" Signed-off-by: Jonathan Kim Reviewed-by: Harish Kasiviswanathan Signed-off-by: Jonathan Kim Acked-by: Amber Lin Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 61fc62f3e003..1a4cdee86759 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev) HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) { - dev->node_props.debug_prop |= HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | - HSA_DBG_WATCH_ADDR_MASK_HI_BIT; + if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3)) + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3; + else + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2)) dev->node_props.debug_prop |= diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h index cba2cd5ed9d1..dea32a9e5506 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h @@ -32,9 +32,12 @@ #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96 +#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 7 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT \ (29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) +#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \ + (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) struct kfd_node_properties { uint64_t hive_id; -- 2.34.1
[PATCH 0/4] Upstream debugger feature for GFX v9.4.3
Jonathan Kim (4): drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points globally for gfx943 drm/amdkfd: add multi-process debugging support for GC v9.4.3 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 27 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 166 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 3 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 6 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 9 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 + .../gpu/drm/amd/include/kgd_kfd_interface.h | 3 +- 13 files changed, 231 insertions(+), 18 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h -- 2.34.1
Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance
On 2023-07-07 11:56, Kim, Jonathan wrote: [Public] -Original Message- From: Huang, JinHuiEric Sent: Friday, July 7, 2023 11:46 AM To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance On 2023-07-07 10:59, Kim, Jonathan wrote: [Public] -Original Message- From: Huang, JinHuiEric Sent: Thursday, July 6, 2023 2:19 PM To: amd-gfx@lists.freedesktop.org Cc: Kim, Jonathan ; Huang, JinHuiEric Subject: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance each xcc instance needs to get iq wait time and set grace period accordingly. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 -- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32 +++--- - .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 9 +++--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 5 files changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index a2bff3f01359..0f12c1989e14 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1606,6 +1606,8 @@ static int set_sched_resources(struct device_queue_manager *dqm) static int initialize_cpsch(struct device_queue_manager *dqm) { + uint32_t xcc_id, xcc_mask = dqm->dev->xcc_mask; + pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm)); mutex_init(>lock_hidden); @@ -1620,8 +1622,11 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); if (dqm->dev->kfd2kgd->get_iq_wait_times) - dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, 0); + for_each_inst(xcc_id, xcc_mask) + dqm->dev->kfd2kgd->get_iq_wait_times( + dqm->dev->adev, + >wait_times[xcc_id], + xcc_id); return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..62a6dc8d3032 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_twait_times; + uint32_twait_times[32]; I think wait_times[16] should be sufficient. We only get the hamming weight of 16 bits for NUM_XCC and I believe the xcc_mask is declared as a uint16_t in the KGD portion anyway. We may as well align to that. wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c index 401096c103b2..f37ab4b6d88c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -374,27 +374,31 @@ int pm_update_grace_period(struct packet_manager *pm, uint32_t grace_period) { int retval = 0; uint32_t *buffer, size; + uint32_t xcc_id, xcc_mask = pm->dqm->dev->xcc_mask; size = pm->pmf->set_grace_period_size; mutex_lock(>lock); if (size) { - kq_acquire_packet_buffer(pm->priv_queue, - size / sizeof(uint32_t), - (unsigned int **)); - - if (!buffer) { - pr_err("Failed to allocate buffer on kernel queue\n"); - retval = -ENOMEM; - goto out; - } + for_each_inst(xcc_id, xcc_mask) { + kq_acquire_packet_buffer(pm->priv_queue, + size / sizeof(uint32_t), + (unsigned int **)); - retval = pm->pmf->set_grace_period(pm, buffer, grace_period); - if (!retval) - kq_submit_packet(pm->priv_queue); - else - kq_rollback_packet(pm->priv_queue); + if (!buffer) { + pr_err("Failed to allocate buffer on kernel queue\n"); + retval = -ENOMEM; + goto out; + } + + retval = pm->pmf->set_grace_period(pm, buffer, + grace_period, xcc_id); + if (!retval) + kq_submit_packet(pm->priv_queue); +
Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance
On 2023-07-07 10:59, Kim, Jonathan wrote: [Public] -Original Message- From: Huang, JinHuiEric Sent: Thursday, July 6, 2023 2:19 PM To: amd-gfx@lists.freedesktop.org Cc: Kim, Jonathan ; Huang, JinHuiEric Subject: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance each xcc instance needs to get iq wait time and set grace period accordingly. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 -- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32 +++ .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 9 +++--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 5 files changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index a2bff3f01359..0f12c1989e14 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1606,6 +1606,8 @@ static int set_sched_resources(struct device_queue_manager *dqm) static int initialize_cpsch(struct device_queue_manager *dqm) { + uint32_t xcc_id, xcc_mask = dqm->dev->xcc_mask; + pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm)); mutex_init(>lock_hidden); @@ -1620,8 +1622,11 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); if (dqm->dev->kfd2kgd->get_iq_wait_times) - dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, 0); + for_each_inst(xcc_id, xcc_mask) + dqm->dev->kfd2kgd->get_iq_wait_times( + dqm->dev->adev, + >wait_times[xcc_id], + xcc_id); return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..62a6dc8d3032 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_twait_times; + uint32_twait_times[32]; I think wait_times[16] should be sufficient. We only get the hamming weight of 16 bits for NUM_XCC and I believe the xcc_mask is declared as a uint16_t in the KGD portion anyway. We may as well align to that. wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c index 401096c103b2..f37ab4b6d88c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -374,27 +374,31 @@ int pm_update_grace_period(struct packet_manager *pm, uint32_t grace_period) { int retval = 0; uint32_t *buffer, size; + uint32_t xcc_id, xcc_mask = pm->dqm->dev->xcc_mask; size = pm->pmf->set_grace_period_size; mutex_lock(>lock); if (size) { - kq_acquire_packet_buffer(pm->priv_queue, - size / sizeof(uint32_t), - (unsigned int **)); - - if (!buffer) { - pr_err("Failed to allocate buffer on kernel queue\n"); - retval = -ENOMEM; - goto out; - } + for_each_inst(xcc_id, xcc_mask) { + kq_acquire_packet_buffer(pm->priv_queue, + size / sizeof(uint32_t), + (unsigned int **)); - retval = pm->pmf->set_grace_period(pm, buffer, grace_period); - if (!retval) - kq_submit_packet(pm->priv_queue); - else - kq_rollback_packet(pm->priv_queue); + if (!buffer) { + pr_err("Failed to allocate buffer on kernel queue\n"); + retval = -ENOMEM; + goto out; + } + + retval = pm->pmf->set_grace_period(pm, buffer, + grace_period, xcc_id); + if (!retval) + kq_submit_packet(pm->priv_queue); + else + kq_rollback_packet(pm->priv_queue); In the event of partial success do we need to roll back (i.e. resubmit default grace period) on failure? The function pm_set_grace_period_v9 always return 0, and it is not complicate operation, it should be always
[PATCH 4/6] drm/amdkfd: enable grace period for xcc instance
each xcc instance needs to get iq wait time and set grace period accordingly. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 -- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32 +++ .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 9 +++--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 5 files changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index a2bff3f01359..0f12c1989e14 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1606,6 +1606,8 @@ static int set_sched_resources(struct device_queue_manager *dqm) static int initialize_cpsch(struct device_queue_manager *dqm) { + uint32_t xcc_id, xcc_mask = dqm->dev->xcc_mask; + pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm)); mutex_init(>lock_hidden); @@ -1620,8 +1622,11 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); if (dqm->dev->kfd2kgd->get_iq_wait_times) - dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, 0); + for_each_inst(xcc_id, xcc_mask) + dqm->dev->kfd2kgd->get_iq_wait_times( + dqm->dev->adev, + >wait_times[xcc_id], + xcc_id); return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..62a6dc8d3032 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_twait_times; + uint32_twait_times[32]; wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c index 401096c103b2..f37ab4b6d88c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -374,27 +374,31 @@ int pm_update_grace_period(struct packet_manager *pm, uint32_t grace_period) { int retval = 0; uint32_t *buffer, size; + uint32_t xcc_id, xcc_mask = pm->dqm->dev->xcc_mask; size = pm->pmf->set_grace_period_size; mutex_lock(>lock); if (size) { - kq_acquire_packet_buffer(pm->priv_queue, - size / sizeof(uint32_t), - (unsigned int **)); - - if (!buffer) { - pr_err("Failed to allocate buffer on kernel queue\n"); - retval = -ENOMEM; - goto out; - } + for_each_inst(xcc_id, xcc_mask) { + kq_acquire_packet_buffer(pm->priv_queue, + size / sizeof(uint32_t), + (unsigned int **)); - retval = pm->pmf->set_grace_period(pm, buffer, grace_period); - if (!retval) - kq_submit_packet(pm->priv_queue); - else - kq_rollback_packet(pm->priv_queue); + if (!buffer) { + pr_err("Failed to allocate buffer on kernel queue\n"); + retval = -ENOMEM; + goto out; + } + + retval = pm->pmf->set_grace_period(pm, buffer, + grace_period, xcc_id); + if (!retval) + kq_submit_packet(pm->priv_queue); + else + kq_rollback_packet(pm->priv_queue); + } } out: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..a9443d661957 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -287,7 +287,8 @@ static int pm_map_queues_v9(struct packet_manager *pm, uint32_t *buffer, static int pm_set_grace_period_v9(struct packet_manager *pm, uint32_t *buffer, - uint32_t grace_period) + uint32_t grace_period, + uint32_t inst) { str
[PATCH 3/6] drm/amdkfd: enable watch points globally for gfx943
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 24083db44724..190b03efe5ff 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -446,7 +446,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, uint32_t *watch_id, uint32_t watch_mode) { - int r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + uint32_t xcc_mask = pdd->dev->xcc_mask; if (r) return r; @@ -460,14 +461,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, } amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( + for_each_inst(xcc_id, xcc_mask) + pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( pdd->dev->adev, watch_address, watch_address_mask, *watch_id, watch_mode, pdd->dev->vm_info.last_vmid_kfd, - 0); + xcc_id); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); if (!pdd->dev->kfd->shared_resources.enable_mes) -- 2.34.1
[PATCH 2/6] drm/amdkfd: restore debugger additional info for gfx v9_4_3
From: Jonathan Kim The additional information that the KFD reports to the debugger was destroyed when the following commit was merged: "drm/amdkfd: convert switches to IP version checking" Signed-off-by: Jonathan Kim Reviewed-by: Harish Kasiviswanathan Signed-off-by: Jonathan Kim Acked-by: Amber Lin Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 61fc62f3e003..1a4cdee86759 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev) HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) { - dev->node_props.debug_prop |= HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | - HSA_DBG_WATCH_ADDR_MASK_HI_BIT; + if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3)) + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3; + else + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2)) dev->node_props.debug_prop |= diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h index cba2cd5ed9d1..dea32a9e5506 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h @@ -32,9 +32,12 @@ #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96 +#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 7 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT \ (29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) +#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \ + (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) struct kfd_node_properties { uint64_t hive_id; -- 2.34.1
[PATCH 5/6] drm/amdkfd: always keep trap enabled for GC v9.4.3
To set TTMP setup on by default. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++--- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index cf1db0ab3471..47c5d16677d6 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -2842,7 +2842,7 @@ static int runtime_disable(struct kfd_process *p) pdd->spi_dbg_override = pdd->dev->kfd2kgd->disable_debug_trap( pdd->dev->adev, - false, + KFD_GC_VERSION(pdd->dev) == IP_VERSION(9, 4, 3), pdd->dev->vm_info.last_vmid_kfd); if (!pdd->dev->kfd->shared_resources.enable_mes) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 190b03efe5ff..4cb9b3b18065 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -591,7 +591,8 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, int unwind pdd->spi_dbg_override = pdd->dev->kfd2kgd->disable_debug_trap( pdd->dev->adev, - target->runtime_info.ttmp_setup, + KFD_GC_VERSION(pdd->dev) == IP_VERSION(9, 4, 3) ? + true : target->runtime_info.ttmp_setup, pdd->dev->vm_info.last_vmid_kfd); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index ba04a4baecf2..91ae9121e2bf 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1644,9 +1644,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev, p->pdds[p->n_pdds++] = pdd; if (kfd_dbg_is_per_vmid_supported(pdd->dev)) pdd->spi_dbg_override = pdd->dev->kfd2kgd->disable_debug_trap( - pdd->dev->adev, - false, - 0); + pdd->dev->adev, + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3), + 0); /* Init idr used for memory handle translation */ idr_init(>alloc_idr); -- 2.34.1
[PATCH 6/6] drm/amdkfd: add multi-process debugging support for GC v9.4.3
From: Jonathan Kim Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended MAP_PROCESS packet to support multi-process debugging. Update the mutli-process debug support list so that the KFD updates the runlist on debug mode setting and that it allocates enough GTT memory during KFD device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h index a289e59ceb79..a0afc6a7b6c4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h @@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p, static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev) { - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || - KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0); + return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || + KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0)); } void debug_event_write_work_handler(struct work_struct *work); -- 2.34.1
[PATCH 1/6] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 10 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 152 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 9 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 10 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 15 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 3 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 3 +- .../gpu/drm/amd/include/kgd_kfd_interface.h | 9 +- 12 files changed, 230 insertions(+), 26 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index 60f9e027fb66..7d7eaed68531 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -23,6 +23,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_arcturus.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_2_offset.h" #include "gc/gc_9_4_2_sh_mask.h" #include @@ -36,7 +37,7 @@ * initialize the debug mode registers after it has disabled GFX off during the * debug session. */ -static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { @@ -50,7 +51,7 @@ static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, } /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */ -static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, bool keep_trap_enabled, uint32_t vmid) { @@ -107,7 +108,7 @@ static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device return data; } -static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, uint8_t wave_launch_mode, uint32_t vmid) { @@ -125,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst ) { uint32_t watch_address_high; uint32_t watch_address_low; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h new file mode 100644 index ..ed349ff397bd --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h @@ -0,0 +1,30 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, + bool restore_dbg_registers, + uint3
[PATCH 0/6] Upstream debugger feature for GFX v9.4.3
Eric Huang (2): drm/amdkfd: enable grace period for xcc instance drm/amdkfd: always keep trap enabled for GC v9.4.3 Jonathan Kim (4): drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points globally for gfx943 drm/amdkfd: add multi-process debugging support for GC v9.4.3 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 10 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 152 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 9 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 10 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 15 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 12 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32 ++-- .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 10 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 + .../gpu/drm/amd/include/kgd_kfd_interface.h | 9 +- 20 files changed, 284 insertions(+), 57 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h -- 2.34.1
[PATCH 3/5] drm/amdkfd: add xcc instance for debugger APIs
Since GFX9 GPU has multiple xcc instances, this is to implement this change in KFD for debugger APIs. Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c| 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 12 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 12 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 13 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 -- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 3 ++- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 12 11 files changed, 61 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index f3f7e0437447..c7f88bfa1976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -126,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst ) { uint32_t watch_address_high; uint32_t watch_address_low; @@ -163,7 +164,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( } static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) + uint32_t watch_id, + uint32_t inst) { return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 3299e268f234..c0546db91579 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -454,7 +454,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst) { uint32_t watch_address_high; uint32_t watch_address_low; @@ -491,7 +492,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch( } static uint32_t kgd_gfx_v9_4_3_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) + uint32_t watch_id, + uint32_t inst) { return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c index 8ad7a7779e14..04daa8f9456b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c @@ -886,7 +886,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device *adev, uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst) { uint32_t watch_address_high; uint32_t watch_address_low; @@ -942,7 +943,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device *adev, } uint32_t kgd_gfx_v10_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) + uint32_t watch_id, + uint32_t inst) { uint32_t watch_address_cntl; @@ -968,7 +970,8 @@ uint32_t kgd_gfx_v10_clear_address_watch(struct amdgpu_device *adev, * deq_retry_wait_time -- Wait Count for Global Wave Syncs. */ void kgd_gfx_v10_get_iq_wait_times(struct amdgpu_device *adev, - uint32_t *wait_times) + uint32_t *wait_times, + uint32_t inst) { *wait_times = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_IQ_WAIT_TIME2)); @@ -978,7 +981,8 @@ void kgd_gfx_v10_build_grace_period_packet_info(struct amdgpu_device *adev
[PATCH 0/5] Upstream debugger feature for GFX v9.4.3
Eric Huang (1): drm/amdkfd: add xcc instance for debugger APIs Jonathan Kim (4): drm/amdgpu: add kfd2kgd debugger callbacks for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points globally for gfx943 drm/amdkfd: add multi-process debugging support for GC v9.4.3 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 13 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 153 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 12 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 13 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 6 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 12 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 13 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 18 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 3 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 + .../gpu/drm/amd/include/kgd_kfd_interface.h | 12 +- 15 files changed, 265 insertions(+), 40 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h -- 2.34.1
[PATCH 5/5] drm/amdkfd: add multi-process debugging support for GC v9.4.3
From: Jonathan Kim Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended MAP_PROCESS packet to support multi-process debugging. Update the mutli-process debug support list so that the KFD updates the runlist on debug mode setting and that it allocates enough GTT memory during KFD device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h index a289e59ceb79..a0afc6a7b6c4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h @@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p, static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev) { - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || - KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0); + return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || + KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0)); } void debug_event_write_work_handler(struct work_struct *work); -- 2.34.1
[PATCH 4/5] drm/amdkfd: enable watch points globally for gfx943
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 16 ++-- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index c0546db91579..d9357a61bf31 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -480,11 +480,13 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch( VALID, 1); - WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) + + WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst), + regTCP_WATCH0_ADDR_H) + (watch_id * TCP_WATCH_STRIDE)), watch_address_high); - WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_L) + + WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst), + regTCP_WATCH0_ADDR_L) + (watch_id * TCP_WATCH_STRIDE)), watch_address_low); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index dcc49183364b..b4ec809c8892 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -413,7 +413,8 @@ static bool kfd_dbg_owns_dev_watch_id(struct kfd_process_device *pdd, int watch_ int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, uint32_t watch_id) { - int r; + int xcc_id, r; + uint32_t xcc_mask = pdd->dev->xcc_mask; if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) return -EINVAL; @@ -425,10 +426,11 @@ int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, } amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( + for_each_inst(xcc_id, xcc_mask) + pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( pdd->dev->adev, watch_id, - 0); + xcc_id); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); if (!pdd->dev->kfd->shared_resources.enable_mes) @@ -447,7 +449,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, uint32_t *watch_id, uint32_t watch_mode) { - int r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + uint32_t xcc_mask = pdd->dev->xcc_mask; if (r) return r; @@ -461,14 +464,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, } amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( + for_each_inst(xcc_id, xcc_mask) + pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( pdd->dev->adev, watch_address, watch_address_mask, *watch_id, watch_mode, pdd->dev->vm_info.last_vmid_kfd, - 0); + xcc_id); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); if (!pdd->dev->kfd->shared_resources.enable_mes) -- 2.34.1
[PATCH 2/5] drm/amdkfd: restore debugger additional info for gfx v9_4_3
From: Jonathan Kim The additional information that the KFD reports to the debugger was destroyed when the following commit was merged: "drm/amdkfd: convert switches to IP version checking" Signed-off-by: Jonathan Kim Reviewed-by: Harish Kasiviswanathan Signed-off-by: Jonathan Kim Acked-by: Amber Lin Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 61fc62f3e003..1a4cdee86759 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev) HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) { - dev->node_props.debug_prop |= HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | - HSA_DBG_WATCH_ADDR_MASK_HI_BIT; + if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3)) + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3; + else + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2)) dev->node_props.debug_prop |= diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h index cba2cd5ed9d1..dea32a9e5506 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h @@ -32,9 +32,12 @@ #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96 +#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 7 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT \ (29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) +#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \ + (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) struct kfd_node_properties { uint64_t hive_id; -- 2.34.1
[PATCH 1/5] drm/amdgpu: add kfd2kgd debugger callbacks for GC v9.4.3
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 7 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 149 +- 3 files changed, 182 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index 60f9e027fb66..f3f7e0437447 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -23,6 +23,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_arcturus.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_2_offset.h" #include "gc/gc_9_4_2_sh_mask.h" #include @@ -36,7 +37,7 @@ * initialize the debug mode registers after it has disabled GFX off during the * debug session. */ -static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { @@ -50,7 +51,7 @@ static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, } /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */ -static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, bool keep_trap_enabled, uint32_t vmid) { @@ -107,7 +108,7 @@ static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device return data; } -static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, uint8_t wave_launch_mode, uint32_t vmid) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h new file mode 100644 index ..ed349ff397bd --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h @@ -0,0 +1,30 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, + bool restore_dbg_registers, + uint32_t vmid); +uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, + bool keep_trap_enabled, + uint32_t vmid); +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, + uint8_t wave_launch_mode, + uint32_t vmid); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 5b4b7f8b92a5..3299e268f234 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -22,6 +22,7 @@ #include "amdgpu.h" #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_3_offset.h" #include "gc/gc_9_4_3_sh_mask.h" #include "athub/athub_1_8_0_offset.h" @@ -32,6 +33,7 @@ #include "soc15.h" #include "sdma/s
[PATCH 5/5] drm/amdkfd: enable watch points globally for gfx943
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 16 ++-- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 17fe4e90f203..9c32b9fbd866 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -480,11 +480,13 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch( VALID, 1); - WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) + + WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst), + regTCP_WATCH0_ADDR_H) + (watch_id * TCP_WATCH_STRIDE)), watch_address_high); - WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_L) + + WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst), + regTCP_WATCH0_ADDR_L) + (watch_id * TCP_WATCH_STRIDE)), watch_address_low); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index dcc49183364b..b4ec809c8892 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -413,7 +413,8 @@ static bool kfd_dbg_owns_dev_watch_id(struct kfd_process_device *pdd, int watch_ int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, uint32_t watch_id) { - int r; + int xcc_id, r; + uint32_t xcc_mask = pdd->dev->xcc_mask; if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) return -EINVAL; @@ -425,10 +426,11 @@ int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, } amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( + for_each_inst(xcc_id, xcc_mask) + pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( pdd->dev->adev, watch_id, - 0); + xcc_id); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); if (!pdd->dev->kfd->shared_resources.enable_mes) @@ -447,7 +449,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, uint32_t *watch_id, uint32_t watch_mode) { - int r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id); + uint32_t xcc_mask = pdd->dev->xcc_mask; if (r) return r; @@ -461,14 +464,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd, } amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( + for_each_inst(xcc_id, xcc_mask) + pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch( pdd->dev->adev, watch_address, watch_address_mask, *watch_id, watch_mode, pdd->dev->vm_info.last_vmid_kfd, - 0); + xcc_id); amdgpu_gfx_off_ctrl(pdd->dev->adev, true); if (!pdd->dev->kfd->shared_resources.enable_mes) -- 2.34.1
[PATCH 1/5] drm/amdgpu: add debugger support for GC v9.4.3
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 7 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 146 +- 3 files changed, 179 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index 60f9e027fb66..f3f7e0437447 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -23,6 +23,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_arcturus.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_2_offset.h" #include "gc/gc_9_4_2_sh_mask.h" #include @@ -36,7 +37,7 @@ * initialize the debug mode registers after it has disabled GFX off during the * debug session. */ -static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { @@ -50,7 +51,7 @@ static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, } /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */ -static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, bool keep_trap_enabled, uint32_t vmid) { @@ -107,7 +108,7 @@ static uint32_t kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device return data; } -static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, uint8_t wave_launch_mode, uint32_t vmid) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h new file mode 100644 index ..5f776ede295e --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h @@ -0,0 +1,30 @@ +/* + * Copyright 2021 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ +uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev, + bool restore_dbg_registers, + uint32_t vmid); +uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev, + bool keep_trap_enabled, + uint32_t vmid); +uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev, + uint8_t wave_launch_mode, + uint32_t vmid); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 5b4b7f8b92a5..7aab8dcf46e1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -22,6 +22,7 @@ #include "amdgpu.h" #include "amdgpu_amdkfd.h" #include "amdgpu_amdkfd_gfx_v9.h" +#include "amdgpu_amdkfd_aldebaran.h" #include "gc/gc_9_4_3_offset.h" #include "gc/gc_9_4_3_sh_mask.h" #include "athub/athub_1_8_0_offset.h" @@ -32,6 +33,7 @@ #include "soc15.h" #include "sdma/s
[PATCH 3/5] drm/amdkfd: restore debugger additional info for gfx v9_4_3
From: Jonathan Kim The additional information that the KFD reports to the debugger was destroyed when the following commit was merged: "drm/amdkfd: convert switches to IP version checking" Signed-off-by: Jonathan Kim Reviewed-by: Harish Kasiviswanathan Signed-off-by: Jonathan Kim Acked-by: Amber Lin Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 61fc62f3e003..1a4cdee86759 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev) HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) { - dev->node_props.debug_prop |= HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | - HSA_DBG_WATCH_ADDR_MASK_HI_BIT; + if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3)) + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3; + else + dev->node_props.debug_prop |= + HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | + HSA_DBG_WATCH_ADDR_MASK_HI_BIT; if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2)) dev->node_props.debug_prop |= diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h index cba2cd5ed9d1..dea32a9e5506 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h @@ -32,9 +32,12 @@ #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96 +#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 7 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT \ (29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) +#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \ + (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT) struct kfd_node_properties { uint64_t hive_id; -- 2.34.1
[PATCH 2/5] drm/amdkfd: add multi-process debugging support for GC v9.4.3
From: Jonathan Kim Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended MAP_PROCESS packet to support multi-process debugging. Update the mutli-process debug support list so that the KFD updates the runlist on debug mode setting and that it allocates enough GTT memory during KFD device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h index a289e59ceb79..a0afc6a7b6c4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h @@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p, static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev) { - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || - KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0); + return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || + KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0)); } void debug_event_write_work_handler(struct work_struct *work); -- 2.34.1
[PATCH 4/5] drm/amdkfd: add xcc instance for debugger APIs
Since GFX9 GPU has multiple xcc instances, this is to implement this change in KFD for debugger APIs. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c| 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h| 6 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 -- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 -- 9 files changed, 36 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index f3f7e0437447..c7f88bfa1976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -126,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst ) { uint32_t watch_address_high; uint32_t watch_address_low; @@ -163,7 +164,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( } static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) + uint32_t watch_id, + uint32_t inst) { return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 7aab8dcf46e1..17fe4e90f203 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -454,7 +454,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch( uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst) { uint32_t watch_address_high; uint32_t watch_address_low; @@ -491,7 +492,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch( } static uint32_t kgd_gfx_v9_4_3_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) + uint32_t watch_id, + uint32_t inst) { return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c index 8ad7a7779e14..225b8929a878 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c @@ -886,7 +886,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device *adev, uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid) + uint32_t debug_vmid, + uint32_t inst) { uint32_t watch_address_high; uint32_t watch_address_low; @@ -942,7 +943,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device *adev, } uint32_t kgd_gfx_v10_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) + uint32_t watch_id, + uint32_t inst) { uint32_t watch_address_cntl; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h index e6b70196071a..c904a08b022b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h @@ -44,9 +44,11 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device *adev, uint32_t watch_address_mask, uint32_t watch_id, uint32_t watch_mode, - uint32_t debug_vmid); + uint32_t debug_vmid, + uint32_t inst); uint32_t kgd_gfx_v10_clear_address_watch(struct
[PATCH 0/5] Upstream debugger feature for GFX v9.4.3
Eric Huang (1): drm/amdkfd: add xcc instance for debugger APIs Jonathan Kim (4): drm/amdgpu: add debugger support for GC v9.4.3 drm/amdkfd: add multi-process debugging support for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points globally for gfx943 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 13 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 150 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 6 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 6 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 6 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 18 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 + .../gpu/drm/amd/include/kgd_kfd_interface.h | 6 +- 13 files changed, 237 insertions(+), 28 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h -- 2.34.1
Re: [PATCH] drm/amdkfd: Don't trigger evictions unmapping dmabuf attachments
Reviewed-by: Eric Huang Regards, Eric On 2023-05-01 16:52, Felix Kuehling wrote: Don't move DMABuf attachments for PCIe P2P mappings to the SYSTEM domain when unmapping. This avoids triggering eviction fences unnecessarily. Instead do the move to SYSTEM and back to GTT when mapping these attachments to ensure the SG table gets updated after evictions. This may still trigger unnecessary evictions if user mode unmaps and remaps the same BO. However, this is unlikely in real applications. Cc: Eric Huang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1002c7834386..bb8e6f6793c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -530,6 +530,12 @@ kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment) { struct ttm_operation_ctx ctx = {.interruptible = true}; struct amdgpu_bo *bo = attachment->bo_va->base.bo; + int ret; + + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); + ret = ttm_bo_validate(>tbo, >placement, ); + if (ret) + return ret; amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT); return ttm_bo_validate(>tbo, >placement, ); @@ -662,11 +668,10 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem, static void kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment) { - struct ttm_operation_ctx ctx = {.interruptible = true}; - struct amdgpu_bo *bo = attachment->bo_va->base.bo; - - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); - ttm_bo_validate(>tbo, >placement, ); + /* This is a no-op. We don't want to trigger eviction fences when +* unmapping DMABufs. Therefore the invalidation (moving to system +* domain) is done in kfd_mem_dmamap_dmabuf. +*/ } /**
Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports
On 2023-04-28 15:42, Felix Kuehling wrote: On 2023-04-28 14:09, Eric Huang wrote: On 2023-04-28 12:41, Felix Kuehling wrote: On 2023-04-28 10:17, Eric Huang wrote: On 2023-04-27 23:46, Kuehling, Felix wrote: [AMD Official Use Only - General] Re-mapping typically happens after evictions, before a new eviction fence gets attached. At that time the old eviction fence should be in the signaled state already, so it can't be signaled again. Therefore I would expect my patch to help with unmapping the DMABuf import, without breaking the eviction case. Are you talking about remapping with a map-to-gpu call from user mode? I think that would only be a problem if the KFD BO was unmapped and remapped multiple times. The first time it's mapped, the fresh dmabuf import should be in the SYSTEM domain, so the validation in the SYSTEM domain before GTT would be a no-op. Yes. The case scenario I am talking about is from user mode, mapping->unmapping->re-mapping to the KFD GTT BO will trigger the eviction. I sort of agree that we don't really rely on the eviction fence on the DMABuf import. The reservation object is shared with the original BO. Moving the original BO triggers the eviction fence, so we don't need to trigger it again on the dmabuf import. Other than moving the original BO, I don't think we can do anything to the DMABuf import that would require an eviction for KFD use case. It is a special use case because we control both the import and the export in the same context. I am thinking about no adding KFD eviction fence in first place of mapping original GTT BO, because I don't see it can be evicted in any cases. That's not an option. We're not adding an eviction fence. The reservation object with the eviction fence is shared between the exported BO and the imported one. That's just how DMABuf works. If you wait for the fences on the imported BO, you are effectively waiting for the fences on the exported BOs. And you can't remove the eviction fence from the exported BO. What if the exported BO will be never evicted in reality? I understand how DMABuf works, and imported BO doesn't have eviction fence, it shares with exported BO's one if eviction happens, but I don't see the exported BO can be evicted. The exported BO can be evicted like any other BO. For example KFDEvictTest is there to cause and test evictions of KFD VRAM BOs. Exporting the BO does not pin it (if DMABUF_MOVE_NOTIFIER is enabled, which it in the upstream kernel), so the exported BO can still be evicted. Yes. KFD VRAM BO can be evicted, but DMABuf 's original exported BO is non-paged/GTT BO. Can GTT BO be evicted? It should be like paged/userptr that doesn't have KFD eviction fence. Regards, Eric Regards, Felix Regards, Eric Regards, Felix In theory GTT BO is mapped by user calling mmap() in system memory like userptr, unlike VRAM it will be not evicted by amdgpu vram manager. The only thing is CPU invalidation, but GTT BO doesn't register mmu notifier, that will be a potential problem when switching paged/userptr to non-paged/GTT for mes scheduler. Regards, Eric In the general case dmabuf imports need their eviction fences. For example when we're importing a DMABuf from somewhere else, so the eviction fence is not shared with a BO that we already control. Even then, unmapping a dmabuf from our KFD VM does not need to wait for any fences on the DMABuf. Regards, Felix -Original Message- From: Huang, JinHuiEric Sent: Thursday, April 27, 2023 14:58 To: Kuehling, Felix ; Koenig, Christian ; Christian König ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports Hi Felix, I tested your patch on mGPU systems. It doesn't break any KFD eviction tests, because tests don't allocate DMABuf import, that doesn't trigger it's eviction fence. The only thing the patch affects is in re-mapping DMABuf imports that the eviction will still be triggered. I have an idea that we probably can remove eviction fence for GTT bo, because currently the only way to trigger the eviction fence is by calling ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do you know there is other case to trigger GTT bo's eviction? Regards, Eric On 2023-04-26 22:21, Felix Kuehling wrote: Hi Eric, Can you try if the attached patch fixes the problem without breaking the eviction tests on a multi-GPU PCIe P2P system? Thanks, Felix On 2023-04-26 13:02, Christian König wrote: Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Felix Kuehling: Am 2023-04
Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports
On 2023-04-28 12:41, Felix Kuehling wrote: On 2023-04-28 10:17, Eric Huang wrote: On 2023-04-27 23:46, Kuehling, Felix wrote: [AMD Official Use Only - General] Re-mapping typically happens after evictions, before a new eviction fence gets attached. At that time the old eviction fence should be in the signaled state already, so it can't be signaled again. Therefore I would expect my patch to help with unmapping the DMABuf import, without breaking the eviction case. Are you talking about remapping with a map-to-gpu call from user mode? I think that would only be a problem if the KFD BO was unmapped and remapped multiple times. The first time it's mapped, the fresh dmabuf import should be in the SYSTEM domain, so the validation in the SYSTEM domain before GTT would be a no-op. Yes. The case scenario I am talking about is from user mode, mapping->unmapping->re-mapping to the KFD GTT BO will trigger the eviction. I sort of agree that we don't really rely on the eviction fence on the DMABuf import. The reservation object is shared with the original BO. Moving the original BO triggers the eviction fence, so we don't need to trigger it again on the dmabuf import. Other than moving the original BO, I don't think we can do anything to the DMABuf import that would require an eviction for KFD use case. It is a special use case because we control both the import and the export in the same context. I am thinking about no adding KFD eviction fence in first place of mapping original GTT BO, because I don't see it can be evicted in any cases. That's not an option. We're not adding an eviction fence. The reservation object with the eviction fence is shared between the exported BO and the imported one. That's just how DMABuf works. If you wait for the fences on the imported BO, you are effectively waiting for the fences on the exported BOs. And you can't remove the eviction fence from the exported BO. What if the exported BO will be never evicted in reality? I understand how DMABuf works, and imported BO doesn't have eviction fence, it shares with exported BO's one if eviction happens, but I don't see the exported BO can be evicted. Regards, Eric Regards, Felix In theory GTT BO is mapped by user calling mmap() in system memory like userptr, unlike VRAM it will be not evicted by amdgpu vram manager. The only thing is CPU invalidation, but GTT BO doesn't register mmu notifier, that will be a potential problem when switching paged/userptr to non-paged/GTT for mes scheduler. Regards, Eric In the general case dmabuf imports need their eviction fences. For example when we're importing a DMABuf from somewhere else, so the eviction fence is not shared with a BO that we already control. Even then, unmapping a dmabuf from our KFD VM does not need to wait for any fences on the DMABuf. Regards, Felix -Original Message- From: Huang, JinHuiEric Sent: Thursday, April 27, 2023 14:58 To: Kuehling, Felix ; Koenig, Christian ; Christian König ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports Hi Felix, I tested your patch on mGPU systems. It doesn't break any KFD eviction tests, because tests don't allocate DMABuf import, that doesn't trigger it's eviction fence. The only thing the patch affects is in re-mapping DMABuf imports that the eviction will still be triggered. I have an idea that we probably can remove eviction fence for GTT bo, because currently the only way to trigger the eviction fence is by calling ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do you know there is other case to trigger GTT bo's eviction? Regards, Eric On 2023-04-26 22:21, Felix Kuehling wrote: Hi Eric, Can you try if the attached patch fixes the problem without breaking the eviction tests on a multi-GPU PCIe P2P system? Thanks, Felix On 2023-04-26 13:02, Christian König wrote: Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Felix Kuehling: Am 2023-04-12 um 18:25 schrieb Eric Huang: It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling I'd like to get an Acked-by from Christian as well before submitting this. I have to admit that I only partially followed the internal discussion, but in general you need a *really* good explanation for this. E.g. add code comment and explain in the commit message extensively why this is needed and why there are no alternative
Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports
On 2023-04-27 23:46, Kuehling, Felix wrote: [AMD Official Use Only - General] Re-mapping typically happens after evictions, before a new eviction fence gets attached. At that time the old eviction fence should be in the signaled state already, so it can't be signaled again. Therefore I would expect my patch to help with unmapping the DMABuf import, without breaking the eviction case. Are you talking about remapping with a map-to-gpu call from user mode? I think that would only be a problem if the KFD BO was unmapped and remapped multiple times. The first time it's mapped, the fresh dmabuf import should be in the SYSTEM domain, so the validation in the SYSTEM domain before GTT would be a no-op. Yes. The case scenario I am talking about is from user mode, mapping->unmapping->re-mapping to the KFD GTT BO will trigger the eviction. I sort of agree that we don't really rely on the eviction fence on the DMABuf import. The reservation object is shared with the original BO. Moving the original BO triggers the eviction fence, so we don't need to trigger it again on the dmabuf import. Other than moving the original BO, I don't think we can do anything to the DMABuf import that would require an eviction for KFD use case. It is a special use case because we control both the import and the export in the same context. I am thinking about no adding KFD eviction fence in first place of mapping original GTT BO, because I don't see it can be evicted in any cases. In theory GTT BO is mapped by user calling mmap() in system memory like userptr, unlike VRAM it will be not evicted by amdgpu vram manager. The only thing is CPU invalidation, but GTT BO doesn't register mmu notifier, that will be a potential problem when switching paged/userptr to non-paged/GTT for mes scheduler. Regards, Eric In the general case dmabuf imports need their eviction fences. For example when we're importing a DMABuf from somewhere else, so the eviction fence is not shared with a BO that we already control. Even then, unmapping a dmabuf from our KFD VM does not need to wait for any fences on the DMABuf. Regards, Felix -Original Message- From: Huang, JinHuiEric Sent: Thursday, April 27, 2023 14:58 To: Kuehling, Felix ; Koenig, Christian ; Christian König ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports Hi Felix, I tested your patch on mGPU systems. It doesn't break any KFD eviction tests, because tests don't allocate DMABuf import, that doesn't trigger it's eviction fence. The only thing the patch affects is in re-mapping DMABuf imports that the eviction will still be triggered. I have an idea that we probably can remove eviction fence for GTT bo, because currently the only way to trigger the eviction fence is by calling ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do you know there is other case to trigger GTT bo's eviction? Regards, Eric On 2023-04-26 22:21, Felix Kuehling wrote: Hi Eric, Can you try if the attached patch fixes the problem without breaking the eviction tests on a multi-GPU PCIe P2P system? Thanks, Felix On 2023-04-26 13:02, Christian König wrote: Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Felix Kuehling: Am 2023-04-12 um 18:25 schrieb Eric Huang: It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling I'd like to get an Acked-by from Christian as well before submitting this. I have to admit that I only partially followed the internal discussion, but in general you need a *really* good explanation for this. E.g. add code comment and explain in the commit message extensively why this is needed and why there are no alternatives. OK. I'll give it a shot: This code path is used among other things when invalidating DMABuf imports. These imports share a reservation object with the exported BO. Waiting on all the fences in this reservation will trigger KFD eviction fences unnecessarily, for example when a DMABuf import for a DMA mapping on a secondary GPU is being unmapped explicitly. Only moving the original exported BO requires stopping KFD user mode queues. If the invalidation is triggered through a move notifier from the exported BO, then moving the original BO already triggered the eviction fence and we don't need to wait for it again on the import. We can identify DMABuf imports in KFD for secondary GPU DMA ma
Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports
Hi Felix, I tested your patch on mGPU systems. It doesn't break any KFD eviction tests, because tests don't allocate DMABuf import, that doesn't trigger it's eviction fence. The only thing the patch affects is in re-mapping DMABuf imports that the eviction will still be triggered. I have an idea that we probably can remove eviction fence for GTT bo, because currently the only way to trigger the eviction fence is by calling ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do you know there is other case to trigger GTT bo's eviction? Regards, Eric On 2023-04-26 22:21, Felix Kuehling wrote: Hi Eric, Can you try if the attached patch fixes the problem without breaking the eviction tests on a multi-GPU PCIe P2P system? Thanks, Felix On 2023-04-26 13:02, Christian König wrote: Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Felix Kuehling: Am 2023-04-12 um 18:25 schrieb Eric Huang: It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling I'd like to get an Acked-by from Christian as well before submitting this. I have to admit that I only partially followed the internal discussion, but in general you need a *really* good explanation for this. E.g. add code comment and explain in the commit message extensively why this is needed and why there are no alternatives. OK. I'll give it a shot: This code path is used among other things when invalidating DMABuf imports. These imports share a reservation object with the exported BO. Waiting on all the fences in this reservation will trigger KFD eviction fences unnecessarily, for example when a DMABuf import for a DMA mapping on a secondary GPU is being unmapped explicitly. Only moving the original exported BO requires stopping KFD user mode queues. If the invalidation is triggered through a move notifier from the exported BO, then moving the original BO already triggered the eviction fence and we don't need to wait for it again on the import. We can identify DMABuf imports in KFD for secondary GPU DMA mappings by the mem_type AMDGPU_PL_PREEMPT. In this case, use a wait operation that ignores KFD eviction fences. How does this sound? To be honest like quite a bad idea. Why in the world are imported BOs moved from GTT to SYSTEM in the first place? As I understand it, the way to update SG tables in SG BOs (e.g. userptr and dmabuf imports) is to move them back and forth between system and GTT domains. If we left the import in the GTT domain all the time, we would have no way to update it, e.g. after an eviction. Currently the move to the system domain is done in the unmap code path. Before memory is freed, we also need to unmap it from GPUVM, including the DMABuf imports on remote GPUs. For the above reason that currently includes moving the import to the system domain. If we removed that from the unmap code path, we'd need to do the move to system somewhere else, maybe in the mapping/validation path. The only reason for this I can think of is that the DMA mappings become invalid for some reasons and in this case waiting for the KFD fence is actually the absolutely right thing to do. In this case the reason the only reason for unmapping the memory is that we're about to free the memory and its DMABuf imports on other GPUs. This is coming from the application with a promise "I'm no longer accessing the memory". We don't need to wait for fences here. We only need to invalidate the PTEs to make sure that any further buggy access by the application will fault. Well in this case just free the BO and it's bo_va structure. The core handling should take care of clearing all the freed up regions. As for updating the SG of a BO you indeed need to move it from GTT to SYSTEM and back, but in this case we should either indeed wait for the KFD fence since page tables in between the operation still have the old entries or we should destroy the BO and create a new one. The later would overwrite the PTEs with invalid entries first and then fill in new valid ones. Regards, Christian. Regards, Felix Regards, Christian. Regards, Felix Regards, Christian. Thanks, Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 2430f3e9f3a7..64795fe9eecb 100644 --- a/drivers/gpu/drm/
Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports
Hi Christian, What do you think about Felix's explanation? Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Felix Kuehling: Am 2023-04-12 um 18:25 schrieb Eric Huang: It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling I'd like to get an Acked-by from Christian as well before submitting this. I have to admit that I only partially followed the internal discussion, but in general you need a *really* good explanation for this. E.g. add code comment and explain in the commit message extensively why this is needed and why there are no alternatives. OK. I'll give it a shot: This code path is used among other things when invalidating DMABuf imports. These imports share a reservation object with the exported BO. Waiting on all the fences in this reservation will trigger KFD eviction fences unnecessarily, for example when a DMABuf import for a DMA mapping on a secondary GPU is being unmapped explicitly. Only moving the original exported BO requires stopping KFD user mode queues. If the invalidation is triggered through a move notifier from the exported BO, then moving the original BO already triggered the eviction fence and we don't need to wait for it again on the import. We can identify DMABuf imports in KFD for secondary GPU DMA mappings by the mem_type AMDGPU_PL_PREEMPT. In this case, use a wait operation that ignores KFD eviction fences. How does this sound? Regards, Felix Regards, Christian. Thanks, Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 2430f3e9f3a7..64795fe9eecb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -526,7 +526,12 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, if ((old_mem->mem_type == TTM_PL_TT || old_mem->mem_type == AMDGPU_PL_PREEMPT) && new_mem->mem_type == TTM_PL_SYSTEM) { - r = ttm_bo_wait_ctx(bo, ctx); + if (old_mem->mem_type == AMDGPU_PL_PREEMPT) + r = amdgpu_bo_sync_wait(abo, + AMDGPU_FENCE_OWNER_KFD, + ctx->interruptible); + else + r = ttm_bo_wait_ctx(bo, ctx); if (r) return r;
[PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports
It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 2430f3e9f3a7..64795fe9eecb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -526,7 +526,12 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, if ((old_mem->mem_type == TTM_PL_TT || old_mem->mem_type == AMDGPU_PL_PREEMPT) && new_mem->mem_type == TTM_PL_SYSTEM) { - r = ttm_bo_wait_ctx(bo, ctx); + if (old_mem->mem_type == AMDGPU_PL_PREEMPT) + r = amdgpu_bo_sync_wait(abo, + AMDGPU_FENCE_OWNER_KFD, + ctx->interruptible); + else + r = ttm_bo_wait_ctx(bo, ctx); if (r) return r; -- 2.34.1
[PATCH] drm/amdgpu: only wait GTT bo's fence in amdgpu_bo_move
It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 2430f3e9f3a7..a0828f6d9fbe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -526,7 +526,10 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, if ((old_mem->mem_type == TTM_PL_TT || old_mem->mem_type == AMDGPU_PL_PREEMPT) && new_mem->mem_type == TTM_PL_SYSTEM) { - r = ttm_bo_wait_ctx(bo, ctx); + if (old_mem->mem_type == AMDGPU_PL_PREEMPT) + r = amdgpu_bo_sync_wait(abo, AMDGPU_FENCE_OWNER_KFD, false); + else + r = ttm_bo_wait_ctx(bo, ctx); if (r) return r; -- 2.34.1
Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping
Hi Felix, What do you think my proposal in my previous email? that setting domain to CPU in kfd_mem_dmamap_dmabuf, and setting domain to GTT in kfd_mem_dmaunmap_dmabuf, that will be doing the similar way as userptr. Thanks, Eric On 2023-04-10 14:50, Felix Kuehling wrote: Sorry, you're right, there is no AMDGPU_GEM_DOMAIN_PREEMPTIBLE. I remembered this wrong. There is a flag called AMDGPU_GEM_CREATE_PREEMPTIBLE, which changes what happens when is placed in the AMDGPU_GEM_DOMAIN_GTT domain. So my proposal would need to be modified to set the flag AMDGPU_GEM_CREATE_PREEMPTIBLE in the imported DMABuf BO. On 2023-04-10 14:28, Eric Huang wrote: Hi Felix, Thanks for your review and suggestion, but unfortunately the AMDGPU_GEM_DOMAIN_PREEMPTIBLE is not defined in amdgpu_drm.h. I understand we need the memory eviction on either kfd_mem_dmamap_dmabuf() or kfd_mem_dmaunmap_dmabuf() to update DMA address, so I am thinking to do it as simply as userptr memory does. The purpose for this change is for non-MES HW scheduler we are using userptr/paged memory, but since GFX11 we will be using MES scheduler and it needs the memory to be allocated as GTT/non-paged memory, so we want all GPUs using GTT/non-paged memory, but there is performance drop, because of eviction in kfd_mem_dmaunmap_dmabuf. Currently userptr memory is evicted in kfd_mem_dmamap_userptr as changing domain to GTT before calling ttm_bo_validate, and not evicted in kfd_mem_dmamap_userptr, so I think we can do the similar way for GTT/non-paged memory that setting domain to CPU in kfd_mem_dmamap_dmabuf, which will evict memory to update DMA address, and setting domain to GTT in kfd_mem_dmaunmap_dmabuf, which will not evict memory. The performance should be the same as userptr/paged memory. This sounds backwards to me. dmaunmap should move objects to the CPU domain because the GPU mapping is potentially invalid. And dmamap must use move it to the GTT domain because that updates the GPU mapping and allows the GPU virtual address mapping to be updated. The problem is the eviction in dmaunmap. Userptrs don't see these evictions because the SG BOs we use to map them on other GPUs do set the AMDGPU_GEM_CREATE_PREEMPTIBLE flag. My idea is to do the same thing for DMABufs that map GTT (and VRAM) BOs to other GPUs._ Now that I look at it in more detail, I see we're already doing that in kfd_mem_attach_dmabuf: *bo = gem_to_amdgpu_bo(gobj); (*bo)->flags |= AMDGPU_GEM_CREATE_PREEMPTIBLE; So then the question is, why is this not working? I think that's the second part of my proposal, which is still needed: 2. Add a special case in the above if-block for old_mem->mem_type == AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction fences Regards, Felix Regards, Eric On 2023-04-04 16:40, Felix Kuehling wrote: [+Christian] OK, this comes from the ttm_bo_wait_ctx call in this section of amdgpu_bo_move: if ((old_mem->mem_type == TTM_PL_TT || old_mem->mem_type == AMDGPU_PL_PREEMPT) && new_mem->mem_type == TTM_PL_SYSTEM) { r = ttm_bo_wait_ctx(bo, ctx); if (r) return r; amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm); ttm_resource_free(bo, >resource); ttm_bo_assign_mem(bo, new_mem); goto out; } We can't just remove this wait. It's not even specific to KFD or DMABuf imports. We also can't just change it to avoid waiting for eviction fences because it's also used for GTT BOs (e.g. before a BO gets swapped under extreme memory pressure). So we also need to trigger the eviction fence in general case. In the specific case of DMABuf imports, they share the reservation object with the original BO. So waiting on the reservation triggers the eviction fence on the original BO. I think we want to avoid the waiting on eviction fences for all BOs where the underlying memory is managed by some other BO, and at the same time also avoid ever evicting the DMABuf import BO. That's what AMDGPU_PL_PREEMPT is for. So I think a combination of two changes should to the trick: 1. Change kfd_mem_dmamap_dmabuf to use AMDGPU_GEM_DOMAIN_PREEMPTIBLE 2. Add a special case in the above if-block for old_mem->mem_type == AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction fences Regards, Felix Am 2023-04-04 um 10:36 schrieb Eric Huang: Here is the backtrace from Jira: Thu Nov 10 13:10:23 2022] Scheduling eviction of pid 97784 in 0 jiffies [Thu Nov 10 13:10:23 2022] WARNING: CPU: 173 PID: 97784 at /var/lib/dkms/amdgpu/5.16.9.22.20-1438746~20.04/build/amd/amdgpu/../amdkfd/kfd_device.c:878 kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu] [Thu Nov 10 13:10:23 2022
Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping
Hi Felix, Thanks for your review and suggestion, but unfortunately the AMDGPU_GEM_DOMAIN_PREEMPTIBLE is not defined in amdgpu_drm.h. I understand we need the memory eviction on either kfd_mem_dmamap_dmabuf() or kfd_mem_dmaunmap_dmabuf() to update DMA address, so I am thinking to do it as simply as userptr memory does. The purpose for this change is for non-MES HW scheduler we are using userptr/paged memory, but since GFX11 we will be using MES scheduler and it needs the memory to be allocated as GTT/non-paged memory, so we want all GPUs using GTT/non-paged memory, but there is performance drop, because of eviction in kfd_mem_dmaunmap_dmabuf. Currently userptr memory is evicted in kfd_mem_dmamap_userptr as changing domain to GTT before calling ttm_bo_validate, and not evicted in kfd_mem_dmamap_userptr, so I think we can do the similar way for GTT/non-paged memory that setting domain to CPU in kfd_mem_dmamap_dmabuf, which will evict memory to update DMA address, and setting domain to GTT in kfd_mem_dmaunmap_dmabuf, which will not evict memory. The performance should be the same as userptr/paged memory. Regards, Eric On 2023-04-04 16:40, Felix Kuehling wrote: [+Christian] OK, this comes from the ttm_bo_wait_ctx call in this section of amdgpu_bo_move: if ((old_mem->mem_type == TTM_PL_TT || old_mem->mem_type == AMDGPU_PL_PREEMPT) && new_mem->mem_type == TTM_PL_SYSTEM) { r = ttm_bo_wait_ctx(bo, ctx); if (r) return r; amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm); ttm_resource_free(bo, >resource); ttm_bo_assign_mem(bo, new_mem); goto out; } We can't just remove this wait. It's not even specific to KFD or DMABuf imports. We also can't just change it to avoid waiting for eviction fences because it's also used for GTT BOs (e.g. before a BO gets swapped under extreme memory pressure). So we also need to trigger the eviction fence in general case. In the specific case of DMABuf imports, they share the reservation object with the original BO. So waiting on the reservation triggers the eviction fence on the original BO. I think we want to avoid the waiting on eviction fences for all BOs where the underlying memory is managed by some other BO, and at the same time also avoid ever evicting the DMABuf import BO. That's what AMDGPU_PL_PREEMPT is for. So I think a combination of two changes should to the trick: 1. Change kfd_mem_dmamap_dmabuf to use AMDGPU_GEM_DOMAIN_PREEMPTIBLE 2. Add a special case in the above if-block for old_mem->mem_type == AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction fences Regards, Felix Am 2023-04-04 um 10:36 schrieb Eric Huang: Here is the backtrace from Jira: Thu Nov 10 13:10:23 2022] Scheduling eviction of pid 97784 in 0 jiffies [Thu Nov 10 13:10:23 2022] WARNING: CPU: 173 PID: 97784 at /var/lib/dkms/amdgpu/5.16.9.22.20-1438746~20.04/build/amd/amdgpu/../amdkfd/kfd_device.c:878 kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu] [Thu Nov 10 13:10:23 2022] Modules linked in: veth amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v2 amd_sched(OE) amdkcl(OE) xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc aufs overlay binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm efi_pstore rapl ipmi_ssif ccp acpi_ipmi k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ast drm_vram_helper drm_ttm_helper ttm mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [Thu Nov 10 13:10:23 2022] pci_hyperv_intf cec psample igb mlxfw rc_core dca ahci xhci_pci tls drm i2c_algo_bit libahci xhci_pci_renesas i2c_piix4 [Thu Nov 10 13:10:23 2022] CPU: 173 PID: 97784 Comm: onnxruntime_tes Tainted: G W OE 5.13.0-30-generic #33~20.04.1-Ubuntu [Thu Nov 10 13:10:23 2022] Hardware name: GIGABYTE G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020 [Thu Nov 10 13:10:23 2022] RIP: 0010:kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu] [Thu Nov 10 13:10:23 2022] Code: 5e 5d c3 4c 89 e7 e8 cb c6 44 df eb e7 49 8b 45 60 48 89 ca 48 c7 c7 38 8b d7 c1 48 89 4d e0 8b b0 20 09 00 00 e8 87 ee 7e df <0f> 0b 48 8b 4d e0 eb 9f 41 be ea ff ff ff eb ba 41 be ed ff ff ff [Thu Nov 10 13:10:23 2022
Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping
lt;48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48 [Thu Nov 10 13:10:23 2022] RSP: 002b:7fffe41e0098 EFLAGS: 0206 ORIG_RAX: 0010 [Thu Nov 10 13:10:23 2022] RAX: ffda RBX: 7fcacc7f7f80 RCX: 7fcaff57b3ab [Thu Nov 10 13:10:23 2022] RDX: 7fffe41e0120 RSI: c0184b19 RDI: 0003 [Thu Nov 10 13:10:23 2022] RBP: 7fffe41e00d0 R08: 562e2d5730d0 R09: [Thu Nov 10 13:10:23 2022] R10: 562e2c928ec0 R11: 0206 R12: 0001 [Thu Nov 10 13:10:23 2022] R13: 7fffe41e04b0 R14: R15: 562e2d3f5b20 [Thu Nov 10 13:10:23 2022] [Thu Nov 10 13:10:23 2022] ---[ end trace 1464f08f6be60b30 ]--- Regards, Eric On 2023-04-04 10:11, Felix Kuehling wrote: If we keep the BO in the GTT domain, it means it will not be updated if we validate it again later in kfd_mem_dmamap_dmabuf. This means we'll use stale DMA addresses when we update the page tables after evictions. I think we'll need to find a different way to avoid triggering the eviction fence on the original BO when changing the placement of the DMABuf import here. If you need help brainstorming here, please share a backtrace from the eviction generated with the debug_evictions module param. Regards, Felix Am 2023-04-03 um 13:59 schrieb Eric Huang: dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf changing placement to CPU will trigger memory eviction after calling ttm_bo_validate, and the eviction will cause performance drop. Keeping the correct domain will solve the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a3b09edfd1bf..17b708acb447 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment) struct ttm_operation_ctx ctx = {.interruptible = true}; struct amdgpu_bo *bo = attachment->bo_va->base.bo; - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT); ttm_bo_validate(>tbo, >placement, ); }
[PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping
dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf changing placement to CPU will trigger memory eviction after calling ttm_bo_validate, and the eviction will cause performance drop. Keeping the correct domain will solve the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a3b09edfd1bf..17b708acb447 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment) struct ttm_operation_ctx ctx = {.interruptible = true}; struct amdgpu_bo *bo = attachment->bo_va->base.bo; - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT); ttm_bo_validate(>tbo, >placement, ); } -- 2.34.1
Re: [PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
Ping. On 2023-01-05 14:28, Eric Huang wrote: The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 9885735f1a30..d4c29e9edf34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2179,7 +2179,7 @@ int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_b } amdgpu_amdkfd_remove_eviction_fence( - bo, bo->kfd_bo->process_info->eviction_fence); + bo, bo->vm_bo->vm->process_info->eviction_fence); amdgpu_bo_unreserve(bo); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 6013f498ea1e..55c2dc48e567 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -231,7 +231,7 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, queue_input.wptr_addr = (uint64_t)q->properties.write_ptr; if (q->wptr_bo) { - wptr_addr_off = (uint64_t)q->properties.write_ptr - (uint64_t)q->wptr_bo->kfd_bo->va; + wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE - 1); queue_input.wptr_mc_addr = ((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off; }
[PATCH] drm/amdkfd: Add sync after creating vram bo
There will be data corruption on vram allocated by svm if initialization is not being done. Adding sync is to resolve this issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index b8c9753a4818..344e20306635 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -574,6 +574,13 @@ svm_range_vram_node_new(struct amdgpu_device *adev, struct svm_range *prange, goto reserve_bo_failed; } + r = amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false); + if (r) { + pr_debug("failed %d to sync bo\n", r); + amdgpu_bo_unreserve(bo); + goto reserve_bo_failed; + } + r = dma_resv_reserve_fences(amdkcl_ttm_resvp(>tbo), 1); if (r) { pr_debug("failed %d to reserve bo\n", r); -- 2.34.1
[PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 9885735f1a30..d4c29e9edf34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2179,7 +2179,7 @@ int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_b } amdgpu_amdkfd_remove_eviction_fence( - bo, bo->kfd_bo->process_info->eviction_fence); + bo, bo->vm_bo->vm->process_info->eviction_fence); amdgpu_bo_unreserve(bo); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 6013f498ea1e..55c2dc48e567 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -231,7 +231,7 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, queue_input.wptr_addr = (uint64_t)q->properties.write_ptr; if (q->wptr_bo) { - wptr_addr_off = (uint64_t)q->properties.write_ptr - (uint64_t)q->wptr_bo->kfd_bo->va; + wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE - 1); queue_input.wptr_mc_addr = ((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off; } -- 2.34.1
[PATCH] amd/amdkfd: Fix a memory limit issue
It is to resolve a regression, which fails to allocate VRAM due to no free memory in application, the reason is we add check of vram_pin_size for memory limit, and application is pinning the memory for Peerdirect, KFD should not count it in memory limit. So removing vram_pin_size will resolve it. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index db772942f7a6..fb1bb593312e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -172,9 +172,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, (kfd_mem_limit.ttm_mem_used + ttm_mem_needed > kfd_mem_limit.max_ttm_mem_limit) || (adev && adev->kfd.vram_used + vram_needed > -adev->gmc.real_vram_size - -atomic64_read(>vram_pin_size) - -reserved_for_pt)) { +adev->gmc.real_vram_size - reserved_for_pt)) { ret = -ENOMEM; goto release; } -- 2.34.1
Re: [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory
The patch has been pushed. I will do that for future patches. Thanks, Eric On 2022-07-12 09:57, Deucher, Alexander wrote: [AMD Official Use Only - General] Can you please include a link to the proposed userspace in the commit message when you commit this? Alex *From:* amd-gfx on behalf of Eric Huang *Sent:* Monday, July 11, 2022 2:41 PM *To:* amd-gfx@lists.freedesktop.org *Cc:* Huang, JinHuiEric ; Kuehling, Felix *Subject:* [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory To expose unified memory for ctx save/resotre area feature availablity to libhsakmt. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 7a423855a86e..afd8ff29c74f 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -36,9 +36,10 @@ * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs * - 1.9 - Add available memory ioctl * - 1.10 - Add SMI profiler event log + * - 1.11 - Add unified memory for ctx save/restore area */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 10 +#define KFD_IOCTL_MINOR_VERSION 11 struct kfd_ioctl_get_version_args { __u32 major_version; /* from KFD */ -- 2.25.1
[PATCH 3/3] libhsakmt: allocate unified memory for ctx save restore area
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c | 109 -- 2 files changed, 95 insertions(+), 15 deletions(-) diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h index 690e001..65f23de 100644 --- a/include/hsakmttypes.h +++ b/include/hsakmttypes.h @@ -1331,6 +1331,7 @@ typedef enum _HSA_SVM_FLAGS { HSA_SVM_FLAG_GPU_RO = 0x0008, // GPUs only read, allows replication HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may allow similar optimizations as RO, but writes fault + HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping always valid as if XNACK is disable } HSA_SVM_FLAGS; typedef enum _HSA_SVM_ATTR_TYPE { diff --git a/src/queues.c b/src/queues.c index d38ea0c..5702c95 100644 --- a/src/queues.c +++ b/src/queues.c @@ -68,6 +68,7 @@ struct queue { uint32_t eop_buffer_size; uint32_t gfxv; bool use_ats; + bool unified_ctx_save_restore; /* This queue structure is allocated from GPU with page aligned size * but only small bytes are used. We use the extra space in the end for * cu_mask bits array. @@ -384,13 +385,49 @@ static void free_exec_aligned_memory(void *addr, uint32_t size, uint32_t align, munmap(addr, size); } +static HSAKMT_STATUS register_svm_range(void *mem, uint32_t size, + uint32_t gpuNode, uint32_t prefetchNode, + uint32_t preferredNode, bool alwaysMapped) +{ + HSA_SVM_ATTRIBUTE *attrs; + HSAuint64 s_attr; + HSAuint32 nattr; + HSAuint32 flags; + + flags = HSA_SVM_FLAG_HOST_ACCESS; + + if (alwaysMapped) { + CHECK_KFD_MINOR_VERSION(11); + flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED; + } + + nattr = 5; + s_attr = sizeof(*attrs) * nattr; + attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr); + + attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC; + attrs[0].value = prefetchNode; + attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC; + attrs[1].value = preferredNode; + attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS; + attrs[2].value = ~flags; + attrs[3].type = HSA_SVM_ATTR_SET_FLAGS; + attrs[3].value = flags; + attrs[4].type = HSA_SVM_ATTR_ACCESS; + attrs[4].value = gpuNode; + + return hsaKmtSVMSetAttr(mem, size, nattr, attrs); +} + static void free_queue(struct queue *q) { if (q->eop_buffer) free_exec_aligned_memory(q->eop_buffer, q->eop_buffer_size, PAGE_SIZE, q->use_ats); - if (q->ctx_save_restore) + if (q->unified_ctx_save_restore) + free(q->ctx_save_restore); + else if (q->ctx_save_restore) free_exec_aligned_memory(q->ctx_save_restore, q->ctx_save_restore_size, PAGE_SIZE, q->use_ats); @@ -398,6 +435,20 @@ static void free_queue(struct queue *q) free_exec_aligned_memory((void *)q, sizeof(*q), PAGE_SIZE, q->use_ats); } +static inline void fill_cwsr_header(struct queue *q, void *addr, + HsaEvent *Event, volatile HSAint64 *ErrPayload) +{ + HsaUserContextSaveAreaHeader *header = + (HsaUserContextSaveAreaHeader *)addr; + + header->ErrorEventId = 0; + if (Event) + header->ErrorEventId = Event->EventId; + header->ErrorReason = ErrPayload; + header->DebugOffset = q->ctx_save_restore_size; + header->DebugSize = q->debug_memory_size; +} + static int handle_concrete_asic(struct queue *q, struct kfd_ioctl_create_queue_args *args, uint32_t NodeId, @@ -425,7 +476,8 @@ static int handle_concrete_asic(struct queue *q, if (ret) { uint32_t total_mem_alloc_size = 0; - HsaUserContextSaveAreaHeader *header; + HsaNodeProperties node; + bool svm_api; args->ctx_save_restore_size = q->ctx_save_restore_size; args->ctl_stack_size = q->ctl_stack_size; @@ -435,22 +487,49 @@ static int handle_concrete_asic(struct queue *q, */ total_mem_alloc_size = q->ctx_save_restore_size + q->debug_memory_size; - q->ctx_save_restore = - allocate_exec_aligned_memory(total_mem_allo
[PATCH 2/3] libhsakmt: add new flag for svm
It is to add new option for always keeping gpu mapping and bump KFD version for the feature of unified save restore memory. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index ba8de4b..4898451 100644 --- a/include/linux/kfd_ioctl.h +++ b/include/linux/kfd_ioctl.h @@ -35,9 +35,11 @@ * - 1.7 - Checkpoint Restore (CRIU) API * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs * - 1.9 - Add available_memory ioctl + * - 1.10 - Add SMI profiler event log + * - 1.11 - Add unified memory for ctx save/restore area */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 9 +#define KFD_IOCTL_MINOR_VERSION 11 /* * Debug revision change log @@ -1080,6 +1082,8 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
[PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory
To expose unified memory for ctx save/resotre area feature availablity to libhsakmt. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 7a423855a86e..afd8ff29c74f 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -36,9 +36,10 @@ * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs * - 1.9 - Add available memory ioctl * - 1.10 - Add SMI profiler event log + * - 1.11 - Add unified memory for ctx save/restore area */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 10 +#define KFD_IOCTL_MINOR_VERSION 11 struct kfd_ioctl_get_version_args { __u32 major_version;/* from KFD */ -- 2.25.1
[PATCH 5/5] libhsakmt: allocate unified memory for ctx save restore area
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c | 103 -- 2 files changed, 90 insertions(+), 14 deletions(-) diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h index 9063f85..2c1c7cc 100644 --- a/include/hsakmttypes.h +++ b/include/hsakmttypes.h @@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS { HSA_SVM_FLAG_GPU_RO = 0x0008, // GPUs only read, allows replication HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may allow similar optimizations as RO, but writes fault + HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping always valid as if XNACK is disable } HSA_SVM_FLAGS; typedef enum _HSA_SVM_ATTR_TYPE { diff --git a/src/queues.c b/src/queues.c index c83dd93..d5109f9 100644 --- a/src/queues.c +++ b/src/queues.c @@ -68,6 +68,7 @@ struct queue { uint32_t eop_buffer_size; uint32_t gfxv; bool use_ats; + bool unified_ctx_save_restore; /* This queue structure is allocated from GPU with page aligned size * but only small bytes are used. We use the extra space in the end for * cu_mask bits array. @@ -383,13 +384,47 @@ static void free_exec_aligned_memory(void *addr, uint32_t size, uint32_t align, munmap(addr, size); } +static HSAKMT_STATUS register_svm_range(void *mem, uint32_t size, + uint32_t gpuNode, uint32_t prefetchNode, + uint32_t preferredNode, bool alwaysMapped) +{ + HSA_SVM_ATTRIBUTE *attrs; + HSAuint64 s_attr; + HSAuint32 nattr; + HSAuint32 flags; + + flags = HSA_SVM_FLAG_HOST_ACCESS; + + if (alwaysMapped) + flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED; + + nattr = 5; + s_attr = sizeof(*attrs) * nattr; + attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr); + + attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC; + attrs[0].value = prefetchNode; + attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC; + attrs[1].value = preferredNode; + attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS; + attrs[2].value = ~flags; + attrs[3].type = HSA_SVM_ATTR_SET_FLAGS; + attrs[3].value = flags; + attrs[4].type = HSA_SVM_ATTR_ACCESS; + attrs[4].value = gpuNode; + + return hsaKmtSVMSetAttr(mem, size, nattr, attrs); +} + static void free_queue(struct queue *q) { if (q->eop_buffer) free_exec_aligned_memory(q->eop_buffer, q->eop_buffer_size, PAGE_SIZE, q->use_ats); - if (q->ctx_save_restore) + if (q->unified_ctx_save_restore) + free(q->ctx_save_restore); + else if (q->ctx_save_restore) free_exec_aligned_memory(q->ctx_save_restore, q->ctx_save_restore_size, PAGE_SIZE, q->use_ats); @@ -425,6 +460,8 @@ static int handle_concrete_asic(struct queue *q, if (ret) { uint32_t total_mem_alloc_size = 0; HsaUserContextSaveAreaHeader *header; + HsaNodeProperties node; + bool svm_api; args->ctx_save_restore_size = q->ctx_save_restore_size; args->ctl_stack_size = q->ctl_stack_size; @@ -434,22 +471,60 @@ static int handle_concrete_asic(struct queue *q, */ total_mem_alloc_size = q->ctx_save_restore_size + q->debug_memory_size; - q->ctx_save_restore = - allocate_exec_aligned_memory(total_mem_alloc_size, -q->use_ats, NodeId, false, false); - if (!q->ctx_save_restore) - return HSAKMT_STATUS_NO_MEMORY; + if (hsaKmtGetNodeProperties(NodeId, )) + svm_api = false; + else + svm_api = node.Capability.ui32.SVMAPISupported; - args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore; + /* Allocate unified memory for context save restore +* area on dGPU. +*/ + if (!q->use_ats && svm_api) { + uint32_t size = PAGE_ALIGN_UP(total_mem_alloc_size); + void *addr; + HSAKMT_STATUS r = HSAKMT_STATUS_ERROR; + + if (posix_memalign(, GPU_HUGE_PAGE_SIZE, size)) +
[PATCH 4/5] libhsakmt: add new flags for svm
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index 8a0ed49..5c45f58 100644 --- a/include/linux/kfd_ioctl.h +++ b/include/linux/kfd_ioctl.h @@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
[PATCH 1/5] drm/amdkfd: add new flag for svm
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..eba04ebfd9a8 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1076,6 +1076,8 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
[PATCH 3/5] drm/amdkfd: optimize svm range evict
It is to avoid unnecessary queue eviction when range is not mapped to gpu. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 586bef4fcc8a..1f1f8f2dfa28 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1775,8 +1775,12 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, if (!p->xnack_enabled || (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) { int evicted_ranges; + bool mapped = prange->mapped_to_gpu; list_for_each_entry(pchild, >child_list, child_list) { + if (!pchild->mapped_to_gpu) + continue; + mapped = true; mutex_lock_nested(>lock, 1); if (pchild->start <= last && pchild->last >= start) { pr_debug("increment pchild invalid [0x%lx 0x%lx]\n", @@ -1786,6 +1790,9 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, mutex_unlock(>lock); } + if (!mapped) + return r; + if (prange->start <= last && prange->last >= start) atomic_inc(>invalid); -- 2.25.1
[PATCH 2/5] drm/amdkfd: change svm range evict
Adding always evict queues when flag is set to KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED as if XNACK off. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 4bf2f75f853b..586bef4fcc8a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1772,7 +1772,8 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n", svms, prange->start, prange->last, start, last); - if (!p->xnack_enabled) { + if (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) { int evicted_ranges; list_for_each_entry(pchild, >child_list, child_list) { @@ -3321,7 +3322,8 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, if (r) goto out_unlock_range; - if (migrated && !p->xnack_enabled) { + if (migrated && (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED))) { pr_debug("restore_work will update mappings of GPUs\n"); mutex_unlock(>migrate_mutex); continue; -- 2.25.1
[PATCH 0/5] Unified memory for CWSR save restore area
amdkfd changes: Eric Huang (3): drm/amdkfd: add new flag for svm drm/amdkfd: change svm range evict drm/amdkfd: optimize svm range evict drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 +++-- include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files changed, 13 insertions(+), 2 deletions(-) libhsakmt(thunk) changes: which are based on https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface Eric Huang (2): libhsakmt: add new flags for svm libhsakmt: allocate unified memory for ctx save restore area include/hsakmttypes.h | 1 + include/linux/kfd_ioctl.h | 2 + src/queues.c | 109 +- 3 files changed, 98 insertions(+), 14 deletions(-) -- 2.25.1
Re: [PATCH 2/2] drm/amdkfd: change svm range evict
On 2022-06-29 19:29, Felix Kuehling wrote: On 2022-06-29 18:53, Eric Huang wrote: On 2022-06-29 18:20, Felix Kuehling wrote: On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 4bf2f75f853b..76e817687ef9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, struct kfd_process *p; int r = 0; + if (!prange->mapped_to_gpu) + return 0; This feels like an unrelated optimization that should be in a separate patch. But I'm not sure this is correct, because it doesn't consider child ranges. svm_range_unmap_from_gpus already contains this check, so ranges should not be unmapped unnecessarily either way. Is there any other benefit to this change that I'm missing? I will send another patch separately that considers child ranges. I think this should only be done in the XNACK-off case. For XNACK-on it's already handled in the svm_range_unmap_from_gpus. Yes and It is also done when KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED is set. The benefit is it will reduce unnecessary queue evicts when allocating context save memory, which is unmapped to gpu. That sounds wrong. The context save area should never be unmapped from GPU. That's the whole point of setting the KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening while migrating the context save area to VRAM for the first time, even before it's mapped to GPU? Yes. It is for the first time when registering svm range and migrating to VRAM are doing together, at this moment, the range is not mapped to GPU. I think there may be another eviction, when the CWSR header is initialized by the CPU. That would also migrate it back to system memory. To avoid that, you should probably register the context save area only after the header has been initialized. Yes. I am using this way. Please look at patch 4/4. I think avoiding an eviction when memory is migrated when it is first registered is worthwhile, not just for CWSR. It is for efficiency reason. On the other hand, without this optimization KFDCWSRTest.InterruptRestore fails with queue preemption error. What do you mean by "queue preemption error"? Does HWS hang? HWS doesn't hang immediately, so there is not error for fence timeout "The cp might be in an unrecoverable state due to an unsuccessful queues preemption". The error is "HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out" after checking mqd manager, which means HWS abandons unmap queue request without returning timeout error to driver. And after this error, the following test will fail at queue creation as HWS hangs I think the reason is the extra queue evicts make HWS too busy to preempt existing queues. There is one unmap_queue packet sent to HWS in current code, and will be three unmap_queue packets with unified memory allocation. When queues of a process are already evicted, they should not get evicted again. That's handled by the qpd->evicted counter. There should never be multiple unmap_queues packets in flight at the same time. If you're seeing three unmap_queues, you should also see queues restored three times. HWS should never be too busy to evict queues. If you're seeing preemptions fail, you may have found a bug. The restore delay worker will do something differently in term of timing. It could restore queues before next unmap_queues, so the situation is too complicate to debug in multiple queues evict/restore environment. The error definitely means there is a bug, from driver point of view there is nothing wrong even adding extra queue eviction, so I try to avoid extra queue eviction and keep it as before. The bottom line is to make sure unified svm range for context save area doesn't cause any failure in kfdtest, so I can theoretically assume extra queue eviction/restoring causes HWS failure. Regards, Eric Regards, Felix So this optimization will keep only one unmap_queue as before. Regards, Eric Regards, Felix + p = container_of(svms, struct kfd_process, svms); pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n", svms, prange->start, prange->last, start, last); - if (!p->xnack_enabled) { + if (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) { int evicted_ranges; list_for_each_entry(pchild, >child_list, child_list) { @@ -3321,7 +3325,9 @@ svm_range_set
Re: [PATCH 2/2] drm/amdkfd: change svm range evict
On 2022-06-29 18:20, Felix Kuehling wrote: On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 4bf2f75f853b..76e817687ef9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, struct kfd_process *p; int r = 0; + if (!prange->mapped_to_gpu) + return 0; This feels like an unrelated optimization that should be in a separate patch. But I'm not sure this is correct, because it doesn't consider child ranges. svm_range_unmap_from_gpus already contains this check, so ranges should not be unmapped unnecessarily either way. Is there any other benefit to this change that I'm missing? I will send another patch separately that considers child ranges. The benefit is it will reduce unnecessary queue evicts when allocating context save memory, which is unmapped to gpu. It is for efficiency reason. On the other hand, without this optimization KFDCWSRTest.InterruptRestore fails with queue preemption error. I think the reason is the extra queue evicts make HWS too busy to preempt existing queues. There is one unmap_queue packet sent to HWS in current code, and will be three unmap_queue packets with unified memory allocation. So this optimization will keep only one unmap_queue as before. Regards, Eric Regards, Felix + p = container_of(svms, struct kfd_process, svms); pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n", svms, prange->start, prange->last, start, last); - if (!p->xnack_enabled) { + if (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) { int evicted_ranges; list_for_each_entry(pchild, >child_list, child_list) { @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, if (r) goto out_unlock_range; - if (migrated && !p->xnack_enabled) { + if (migrated && (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) && + prange->mapped_to_gpu) { pr_debug("restore_work will update mappings of GPUs\n"); mutex_unlock(>migrate_mutex); continue;
[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c | 109 -- 2 files changed, 96 insertions(+), 14 deletions(-) diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h index 9063f85..2c1c7cc 100644 --- a/include/hsakmttypes.h +++ b/include/hsakmttypes.h @@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS { HSA_SVM_FLAG_GPU_RO = 0x0008, // GPUs only read, allows replication HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may allow similar optimizations as RO, but writes fault + HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping always valid as if XNACK is disable } HSA_SVM_FLAGS; typedef enum _HSA_SVM_ATTR_TYPE { diff --git a/src/queues.c b/src/queues.c index c83dd93..e65103d 100644 --- a/src/queues.c +++ b/src/queues.c @@ -68,6 +68,7 @@ struct queue { uint32_t eop_buffer_size; uint32_t gfxv; bool use_ats; + bool unified_ctx_save_restore; /* This queue structure is allocated from GPU with page aligned size * but only small bytes are used. We use the extra space in the end for * cu_mask bits array. @@ -383,13 +384,50 @@ static void free_exec_aligned_memory(void *addr, uint32_t size, uint32_t align, munmap(addr, size); } +static HSAKMT_STATUS register_exec_svm_range(void *mem, uint32_t size, + uint32_t gpuNode, uint32_t prefetchNode, + uint32_t preferredNode, bool alwaysMapped) +{ + HSA_SVM_ATTRIBUTE *attrs; + HSAuint64 s_attr; + HSAuint32 nattr; + HSAuint32 flags; + + flags = HSA_SVM_FLAG_HOST_ACCESS | + HSA_SVM_FLAG_GPU_EXEC; + + if (alwaysMapped) + flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED; + + nattr = 5; + s_attr = sizeof(*attrs) * nattr; + attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr); + + attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC; + attrs[0].value = prefetchNode; + attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC; + attrs[1].value = preferredNode; + attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS; + attrs[2].value = flags; + attrs[3].type = HSA_SVM_ATTR_SET_FLAGS; + attrs[3].value = flags; + attrs[4].type = HSA_SVM_ATTR_ACCESS; + attrs[4].value = gpuNode; + + return hsaKmtSVMSetAttr(mem, size, nattr, attrs); +} + static void free_queue(struct queue *q) { if (q->eop_buffer) free_exec_aligned_memory(q->eop_buffer, q->eop_buffer_size, PAGE_SIZE, q->use_ats); - if (q->ctx_save_restore) + if (q->unified_ctx_save_restore) + munmap(q->ctx_save_restore, + ALIGN_UP(q->ctx_save_restore_size + q->debug_memory_size, + PAGE_SIZE)); + else if (q->ctx_save_restore) free_exec_aligned_memory(q->ctx_save_restore, q->ctx_save_restore_size, PAGE_SIZE, q->use_ats); @@ -425,6 +463,8 @@ static int handle_concrete_asic(struct queue *q, if (ret) { uint32_t total_mem_alloc_size = 0; HsaUserContextSaveAreaHeader *header; + HsaNodeProperties node; + bool svm_api; args->ctx_save_restore_size = q->ctx_save_restore_size; args->ctl_stack_size = q->ctl_stack_size; @@ -434,22 +474,63 @@ static int handle_concrete_asic(struct queue *q, */ total_mem_alloc_size = q->ctx_save_restore_size + q->debug_memory_size; - q->ctx_save_restore = - allocate_exec_aligned_memory(total_mem_alloc_size, -q->use_ats, NodeId, false, false); - if (!q->ctx_save_restore) - return HSAKMT_STATUS_NO_MEMORY; + if (hsaKmtGetNodeProperties(NodeId, )) + svm_api = false; + else + svm_api = node.Capability.ui32.SVMAPISupported; - args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore; + /* Allocate unified memory for context save restore +* area on dGPU. +*/ + if (!q->use_ats && svm_api) { + uint32_t size = ALIGN_UP(total_mem_alloc_size, PAGE
[PATCH 3/4] libhsakmt: add new flags for svm
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index 8a0ed49..5c45f58 100644 --- a/include/linux/kfd_ioctl.h +++ b/include/linux/kfd_ioctl.h @@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c | 109 -- 2 files changed, 96 insertions(+), 14 deletions(-) diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h index 9063f85..2c1c7cc 100644 --- a/include/hsakmttypes.h +++ b/include/hsakmttypes.h @@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS { HSA_SVM_FLAG_GPU_RO = 0x0008, // GPUs only read, allows replication HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may allow similar optimizations as RO, but writes fault + HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping always valid as if XNACK is disable } HSA_SVM_FLAGS; typedef enum _HSA_SVM_ATTR_TYPE { diff --git a/src/queues.c b/src/queues.c index c83dd93..e65103d 100644 --- a/src/queues.c +++ b/src/queues.c @@ -68,6 +68,7 @@ struct queue { uint32_t eop_buffer_size; uint32_t gfxv; bool use_ats; + bool unified_ctx_save_restore; /* This queue structure is allocated from GPU with page aligned size * but only small bytes are used. We use the extra space in the end for * cu_mask bits array. @@ -383,13 +384,50 @@ static void free_exec_aligned_memory(void *addr, uint32_t size, uint32_t align, munmap(addr, size); } +static HSAKMT_STATUS register_exec_svm_range(void *mem, uint32_t size, + uint32_t gpuNode, uint32_t prefetchNode, + uint32_t preferredNode, bool alwaysMapped) +{ + HSA_SVM_ATTRIBUTE *attrs; + HSAuint64 s_attr; + HSAuint32 nattr; + HSAuint32 flags; + + flags = HSA_SVM_FLAG_HOST_ACCESS | + HSA_SVM_FLAG_GPU_EXEC; + + if (alwaysMapped) + flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED; + + nattr = 5; + s_attr = sizeof(*attrs) * nattr; + attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr); + + attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC; + attrs[0].value = prefetchNode; + attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC; + attrs[1].value = preferredNode; + attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS; + attrs[2].value = flags; + attrs[3].type = HSA_SVM_ATTR_SET_FLAGS; + attrs[3].value = flags; + attrs[4].type = HSA_SVM_ATTR_ACCESS; + attrs[4].value = gpuNode; + + return hsaKmtSVMSetAttr(mem, size, nattr, attrs); +} + static void free_queue(struct queue *q) { if (q->eop_buffer) free_exec_aligned_memory(q->eop_buffer, q->eop_buffer_size, PAGE_SIZE, q->use_ats); - if (q->ctx_save_restore) + if (q->unified_ctx_save_restore) + munmap(q->ctx_save_restore, + ALIGN_UP(q->ctx_save_restore_size + q->debug_memory_size, + PAGE_SIZE)); + else if (q->ctx_save_restore) free_exec_aligned_memory(q->ctx_save_restore, q->ctx_save_restore_size, PAGE_SIZE, q->use_ats); @@ -425,6 +463,8 @@ static int handle_concrete_asic(struct queue *q, if (ret) { uint32_t total_mem_alloc_size = 0; HsaUserContextSaveAreaHeader *header; + HsaNodeProperties node; + bool svm_api; args->ctx_save_restore_size = q->ctx_save_restore_size; args->ctl_stack_size = q->ctl_stack_size; @@ -434,22 +474,63 @@ static int handle_concrete_asic(struct queue *q, */ total_mem_alloc_size = q->ctx_save_restore_size + q->debug_memory_size; - q->ctx_save_restore = - allocate_exec_aligned_memory(total_mem_alloc_size, -q->use_ats, NodeId, false, false); - if (!q->ctx_save_restore) - return HSAKMT_STATUS_NO_MEMORY; + if (hsaKmtGetNodeProperties(NodeId, )) + svm_api = false; + else + svm_api = node.Capability.ui32.SVMAPISupported; - args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore; + /* Allocate unified memory for context save restore +* area on dGPU. +*/ + if (!q->use_ats && svm_api) { + uint32_t size = ALIGN_UP(total_mem_alloc_size, PAGE
[PATCH 0/4] Unified memory for CWSR save restore area
amdkfd changes: Eric Huang (2): drm/amdkfd: add new flag for svm drm/amdkfd: change svm range evict drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files changed, 10 insertions(+), 2 deletions(-) libhsakmt(thunk) changes: which are based on https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface Eric Huang (2): libhsakmt: add new flags for svm libhsakmt: allocate unified memory for ctx save restore area include/hsakmttypes.h | 1 + include/linux/kfd_ioctl.h | 2 + src/queues.c | 109 +- 3 files changed, 98 insertions(+), 14 deletions(-) -- 2.25.1
[PATCH 3/4] libhsakmt: add new flags for svm
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index 8a0ed49..5c45f58 100644 --- a/include/linux/kfd_ioctl.h +++ b/include/linux/kfd_ioctl.h @@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
[PATCH 2/2] drm/amdkfd: change svm range evict
Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 4bf2f75f853b..76e817687ef9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, struct kfd_process *p; int r = 0; + if (!prange->mapped_to_gpu) + return 0; + p = container_of(svms, struct kfd_process, svms); pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n", svms, prange->start, prange->last, start, last); - if (!p->xnack_enabled) { + if (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) { int evicted_ranges; list_for_each_entry(pchild, >child_list, child_list) { @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, if (r) goto out_unlock_range; - if (migrated && !p->xnack_enabled) { + if (migrated && (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) && + prange->mapped_to_gpu) { pr_debug("restore_work will update mappings of GPUs\n"); mutex_unlock(>migrate_mutex); continue; -- 2.25.1
[PATCH 1/2] drm/amdkfd: add new flag for svm
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..eba04ebfd9a8 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1076,6 +1076,8 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
Re: [PATCH 1/3] drm/amdkfd: add new flags for svm
Thank you, Felix. I will send all libhsakmt changes and amdkfd changes to amd-gfx. Regards, Eric On 2022-06-28 16:44, Felix Kuehling wrote: Am 2022-06-27 um 12:01 schrieb Eric Huang: No. There is only internal link for now, because it is under review. Once it is submitted, external link should be in gerritgit for libhsakmt. Hi Eric, For anything that requires ioctl API changes, the user mode and kernel mode changes need to be reviewed together in public. You can either post the libhsakmt change by email to amd-gfx, or you can push your libhsakmt development branch to a personal branch on github and include a link to that in the kernel commit description. Alex, some background about this series: We are looking into using unified memory for CWSR context save space. This allows us to get lower preemption latency when VRAM is available, but migrate it to system memory when more VRAM is needed for application allocations. Because we cannot preempt in the trap handler, and we want to guarantee finite time for preemption and trap handler execution, we need to prevent page faults on any memory accessed by the trap handler. The KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag is meant to guarantee that. I think the KFD_IOCTL_SVM_FLAG_CUSTOM is not necessary. I've responded to Eric with an alternative idea. Regards, Felix Regards, Eric On 2022-06-27 11:58, Alex Deucher wrote: On Mon, Jun 27, 2022 at 11:36 AM Eric Huang wrote: http://gerrit-git.amd.com/c/compute/ec/libhsakmt/+/697296 Got an external link? Alex Regards, Eric On 2022-06-27 11:33, Alex Deucher wrote: On Fri, Jun 24, 2022 at 12:03 PM Eric Huang wrote: It is to add new options for always keeping gpu mapping and custom of coarse grain allocation intead of fine grain as default. Signed-off-by: Eric Huang Can you provide a link to the proposed userspace for this? Alex --- include/uapi/linux/kfd_ioctl.h | 4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..9dbf215675a0 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC 0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 +/* Allow set custom flags instead of defaults */ +#define KFD_IOCTL_SVM_FLAG_CUSTOM 0x8000 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
Re: [PATCH 1/3] drm/amdkfd: add new flags for svm
No. There is only internal link for now, because it is under review. Once it is submitted, external link should be in gerritgit for libhsakmt. Regards, Eric On 2022-06-27 11:58, Alex Deucher wrote: On Mon, Jun 27, 2022 at 11:36 AM Eric Huang wrote: https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgerrit-git.amd.com%2Fc%2Fcompute%2Fec%2Flibhsakmt%2F%2B%2F697296data=05%7C01%7Cjinhuieric.huang%40amd.com%7C61498029cd6743a4519108da5855f02e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637919423397667222%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=wPlHRSmOyzO%2B2vbwN9IK4qR5dhk%2BaOw2rt3JEdjizRU%3Dreserved=0 Got an external link? Alex Regards, Eric On 2022-06-27 11:33, Alex Deucher wrote: On Fri, Jun 24, 2022 at 12:03 PM Eric Huang wrote: It is to add new options for always keeping gpu mapping and custom of coarse grain allocation intead of fine grain as default. Signed-off-by: Eric Huang Can you provide a link to the proposed userspace for this? Alex --- include/uapi/linux/kfd_ioctl.h | 4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..9dbf215675a0 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 +/* Allow set custom flags instead of defaults */ +#define KFD_IOCTL_SVM_FLAG_CUSTOM 0x8000 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
Re: [PATCH 1/3] drm/amdkfd: add new flags for svm
http://gerrit-git.amd.com/c/compute/ec/libhsakmt/+/697296 Regards, Eric On 2022-06-27 11:33, Alex Deucher wrote: On Fri, Jun 24, 2022 at 12:03 PM Eric Huang wrote: It is to add new options for always keeping gpu mapping and custom of coarse grain allocation intead of fine grain as default. Signed-off-by: Eric Huang Can you provide a link to the proposed userspace for this? Alex --- include/uapi/linux/kfd_ioctl.h | 4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..9dbf215675a0 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 +/* Allow set custom flags instead of defaults */ +#define KFD_IOCTL_SVM_FLAG_CUSTOM 0x8000 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
[PATCH 3/3] drm/amdkfd: add custom svm range flags setting
It is to give a chance for user to change default flags setting, such as fine grain to coarse grain. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 353306037959..caadd18c447a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -722,7 +722,10 @@ svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange, break; case KFD_IOCTL_SVM_ATTR_SET_FLAGS: *update_mapping = true; - prange->flags |= attrs[i].value; + if (attrs[i].value & KFD_IOCTL_SVM_FLAG_CUSTOM) + prange->flags = attrs[i].value; + else + prange->flags |= attrs[i].value; break; case KFD_IOCTL_SVM_ATTR_CLR_FLAGS: *update_mapping = true; -- 2.25.1
[PATCH 2/3] drm/amdkfd: change svm range evict
Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 4bf2f75f853b..353306037959 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm, struct kfd_process *p; int r = 0; + if (prange->mapped_to_gpu) + return 0; + p = container_of(svms, struct kfd_process, svms); pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n", svms, prange->start, prange->last, start, last); - if (!p->xnack_enabled) { + if (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) { int evicted_ranges; list_for_each_entry(pchild, >child_list, child_list) { @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, if (r) goto out_unlock_range; - if (migrated && !p->xnack_enabled) { + if (migrated && (!p->xnack_enabled || + (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) && + prange->mapped_to_gpu) { pr_debug("restore_work will update mappings of GPUs\n"); mutex_unlock(>migrate_mutex); continue; -- 2.25.1
[PATCH 1/3] drm/amdkfd: add new flags for svm
It is to add new options for always keeping gpu mapping and custom of coarse grain allocation intead of fine grain as default. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 4 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..9dbf215675a0 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args { #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */ #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020 +/* Keep GPU memory mapping always valid as if XNACK is disable */ +#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040 +/* Allow set custom flags instead of defaults */ +#define KFD_IOCTL_SVM_FLAG_CUSTOM 0x8000 /** * kfd_ioctl_svm_op - SVM ioctl operations -- 2.25.1
Re: [PATCH 1/1] Revert "drm/amdkfd: Add queue to MES if it becomes active"
Reviewed-by: Eric Huang On 2022-06-17 15:26, Philip Yang wrote: This reverts commit 8b9aa1fa82baf4e8b6a2daa3aa4d69b728df727e. As it breaks pqm_set_gws. --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 67ae5b6385a2..e1797657b04c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -866,10 +866,8 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q, * dqm->active_queue_count to determine whether a new runlist must be * uploaded. */ - if (q->properties.is_active) { - add_queue = true; - if (!prev_active) - increment_queue_count(dqm, >qpd, q); + if (q->properties.is_active && !prev_active) { + increment_queue_count(dqm, >qpd, q); } else if (!q->properties.is_active && prev_active) { decrement_queue_count(dqm, >qpd, q); } else if (q->gws && !q->properties.is_gws) {
Re: [PATCH 1/2] drm/amdkfd: Add queue to MES if it becomes active
Does it break the case of q->gws with q->properties.is_active == true? Regards, Eric On 2022-06-15 17:56, Philip Yang wrote: We remove the user queue from MES scheduler to update queue properties. If the queue becomes active after updating, add the user queue to MES scheduler, to be able to handle command packet submission. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index e1797657b04c..67ae5b6385a2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -866,8 +866,10 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q, * dqm->active_queue_count to determine whether a new runlist must be * uploaded. */ - if (q->properties.is_active && !prev_active) { - increment_queue_count(dqm, >qpd, q); + if (q->properties.is_active) { + add_queue = true; + if (!prev_active) + increment_queue_count(dqm, >qpd, q); } else if (!q->properties.is_active && prev_active) { decrement_queue_count(dqm, >qpd, q); } else if (q->gws && !q->properties.is_gws) {
[PATCH 1/2] drm/amdkfd: port cwsr trap handler from dkms branch
Most of changes are for debugger feature, and it is to simplify trap handler support for new asics in the future. Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2527 + .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 325 ++- .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 244 +- 3 files changed, 1596 insertions(+), 1500 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 475f89700c74..8cbdc7f519c6 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -166,7 +166,7 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { 0x807c847c, 0x806eff6e, 0x0400, 0xbf0a757c, 0xbf85ffef, 0xbf9c, - 0xbf8200cd, 0xbef8007e, + 0xbf8200ce, 0xbef8007e, 0x8679ff7f, 0x, 0x8779ff79, 0x0004, 0xbefa0080, 0xbefb00ff, @@ -212,304 +212,310 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { 0x761e, 0xe0524100, 0x761e0100, 0xe0524200, 0x761e0200, 0xe0524300, - 0x761e0300, 0xb8f22a05, - 0x80728172, 0x8e728a72, - 0xb8f61605, 0x80768176, - 0x8e768676, 0x80727672, - 0x80f2c072, 0xb8f31605, - 0x80738173, 0x8e738473, - 0x8e7a8273, 0xbefa00ff, - 0x0100, 0xbefc0073, - 0xc031003c, 0x0072, - 0x80f2c072, 0xbf8c007f, - 0x80fc907c, 0xbe802d00, - 0xbe822d02, 0xbe842d04, - 0xbe862d06, 0xbe882d08, - 0xbe8a2d0a, 0xbe8c2d0c, - 0xbe8e2d0e, 0xbf06807c, - 0xbf84fff1, 0xb8f22a05, - 0x80728172, 0x8e728a72, - 0xb8f61605, 0x80768176, - 0x8e768676, 0x80727672, - 0xbefa0084, 0xbefa00ff, - 0x0100, 0xc0211cfc, + 0x761e0300, 0xbf8c0f70, + 0xb8f22a05, 0x80728172, + 0x8e728a72, 0xb8f61605, + 0x80768176, 0x8e768676, + 0x80727672, 0x80f2c072, + 0xb8f31605, 0x80738173, + 0x8e738473, 0x8e7a8273, + 0xbefa00ff, 0x0100, + 0xbefc0073, 0xc031003c, + 0x0072, 0x80f2c072, + 0xbf8c007f, 0x80fc907c, + 0xbe802d00, 0xbe822d02, + 0xbe842d04, 0xbe862d06, + 0xbe882d08, 0xbe8a2d0a, + 0xbe8c2d0c, 0xbe8e2d0e, + 0xbf06807c, 0xbf84fff1, + 0xb8f22a05, 0x80728172, + 0x8e728a72, 0xb8f61605, + 0x80768176, 0x8e768676, + 0x80727672, 0xbefa0084, + 0xbefa00ff, 0x0100, + 0xc0211cfc, 0x0072, + 0x80728472, 0xc0211c3c, 0x0072, 0x80728472, - 0xc0211c3c, 0x0072, - 0x80728472, 0xc0211c7c, + 0xc0211c7c, 0x0072, + 0x80728472, 0xc0211bbc, 0x0072, 0x80728472, - 0xc0211bbc, 0x0072, - 0x80728472, 0xc0211bfc, + 0xc0211bfc, 0x0072, + 0x80728472, 0xc0211d3c, 0x0072, 0x80728472, - 0xc0211d3c, 0x0072, - 0x80728472, 0xc0211d7c, + 0xc0211d7c, 0x0072, + 0x80728472, 0xc0211a3c, 0x0072, 0x80728472, - 0xc0211a3c, 0x0072, - 0x80728472, 0xc0211a7c, + 0xc0211a7c, 0x0072, + 0x80728472, 0xc0211dfc, 0x0072, 0x80728472, - 0xc0211dfc, 0x0072, - 0x80728472, 0xc0211b3c, + 0xc0211b3c, 0x0072, + 0x80728472, 0xc0211b7c, 0x0072, 0x80728472, - 0xc0211b7c, 0x0072, - 0x80728472, 0xbf8c007f, - 0xbefc0073, 0xbefe006e, - 0xbeff006f, 0x867375ff, - 0x03ff, 0xb9734803, - 0x867375ff, 0xf800, - 0x8f738b73, 0xb973a2c3, - 0xb977f801, 0x8673ff71, - 0xf000, 0x8f739c73, - 0x8e739073, 0xbef60080, - 0x87767376, 0x8673ff71, - 0x0800, 0x8f739b73, - 0x8e738f73, 0x87767376, - 0x8673ff74, 0x0080, - 0x8f739773, 0xb976f807, - 0x8671ff71, 0x, - 0x86fe7e7e, 0x86ea6a6a, - 0x8f768374, 0xb976e0c2, - 0xbf82, 0xb9740002, - 0xbf8a, 0x95807370, - 0xbf81, 0x, + 0xbf8c007f, 0xbefc0073, + 0xbefe006e, 0xbeff006f, + 0x867375ff, 0x03ff, + 0xb9734803, 0x867375ff, + 0xf800, 0x8f738b73, + 0xb973a2c3, 0xb977f801, + 0x8673ff71, 0xf000, + 0x8f739c73, 0x8e739073, + 0xbef60080, 0x87767376, + 0x8673ff71, 0x0800, + 0x8f739b73, 0x8e738f73, + 0x87767376, 0x8673ff74, + 0x0080, 0x8f739773, + 0xb976f807, 0x8671ff71, + 0x, 0x86fe7e7e, + 0x86ea6a6a, 0x8f768374, + 0xb976e0c2, 0xbf82, + 0xb9740002, 0xbf8a, + 0x95807370, 0xbf81, }; static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf820248, - 0xb8f8f802, 0x89788678, - 0xb8eef801, 0x866eff6e, - 0x0800, 0xbf840003, + 0xbf820001, 0xbf820254, + 0xb8f8f802, 0x8978ff78, + 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, - 0xbf840016, 0xb8fbf803
[PATCH 1/2] drm/amdkfd: port cwsr trap handler from dkms branch
It is to simplify trap handler support for new asics in the future. Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2527 + .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 325 ++- .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 244 +- 3 files changed, 1596 insertions(+), 1500 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 475f89700c74..8cbdc7f519c6 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -166,7 +166,7 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { 0x807c847c, 0x806eff6e, 0x0400, 0xbf0a757c, 0xbf85ffef, 0xbf9c, - 0xbf8200cd, 0xbef8007e, + 0xbf8200ce, 0xbef8007e, 0x8679ff7f, 0x, 0x8779ff79, 0x0004, 0xbefa0080, 0xbefb00ff, @@ -212,304 +212,310 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { 0x761e, 0xe0524100, 0x761e0100, 0xe0524200, 0x761e0200, 0xe0524300, - 0x761e0300, 0xb8f22a05, - 0x80728172, 0x8e728a72, - 0xb8f61605, 0x80768176, - 0x8e768676, 0x80727672, - 0x80f2c072, 0xb8f31605, - 0x80738173, 0x8e738473, - 0x8e7a8273, 0xbefa00ff, - 0x0100, 0xbefc0073, - 0xc031003c, 0x0072, - 0x80f2c072, 0xbf8c007f, - 0x80fc907c, 0xbe802d00, - 0xbe822d02, 0xbe842d04, - 0xbe862d06, 0xbe882d08, - 0xbe8a2d0a, 0xbe8c2d0c, - 0xbe8e2d0e, 0xbf06807c, - 0xbf84fff1, 0xb8f22a05, - 0x80728172, 0x8e728a72, - 0xb8f61605, 0x80768176, - 0x8e768676, 0x80727672, - 0xbefa0084, 0xbefa00ff, - 0x0100, 0xc0211cfc, + 0x761e0300, 0xbf8c0f70, + 0xb8f22a05, 0x80728172, + 0x8e728a72, 0xb8f61605, + 0x80768176, 0x8e768676, + 0x80727672, 0x80f2c072, + 0xb8f31605, 0x80738173, + 0x8e738473, 0x8e7a8273, + 0xbefa00ff, 0x0100, + 0xbefc0073, 0xc031003c, + 0x0072, 0x80f2c072, + 0xbf8c007f, 0x80fc907c, + 0xbe802d00, 0xbe822d02, + 0xbe842d04, 0xbe862d06, + 0xbe882d08, 0xbe8a2d0a, + 0xbe8c2d0c, 0xbe8e2d0e, + 0xbf06807c, 0xbf84fff1, + 0xb8f22a05, 0x80728172, + 0x8e728a72, 0xb8f61605, + 0x80768176, 0x8e768676, + 0x80727672, 0xbefa0084, + 0xbefa00ff, 0x0100, + 0xc0211cfc, 0x0072, + 0x80728472, 0xc0211c3c, 0x0072, 0x80728472, - 0xc0211c3c, 0x0072, - 0x80728472, 0xc0211c7c, + 0xc0211c7c, 0x0072, + 0x80728472, 0xc0211bbc, 0x0072, 0x80728472, - 0xc0211bbc, 0x0072, - 0x80728472, 0xc0211bfc, + 0xc0211bfc, 0x0072, + 0x80728472, 0xc0211d3c, 0x0072, 0x80728472, - 0xc0211d3c, 0x0072, - 0x80728472, 0xc0211d7c, + 0xc0211d7c, 0x0072, + 0x80728472, 0xc0211a3c, 0x0072, 0x80728472, - 0xc0211a3c, 0x0072, - 0x80728472, 0xc0211a7c, + 0xc0211a7c, 0x0072, + 0x80728472, 0xc0211dfc, 0x0072, 0x80728472, - 0xc0211dfc, 0x0072, - 0x80728472, 0xc0211b3c, + 0xc0211b3c, 0x0072, + 0x80728472, 0xc0211b7c, 0x0072, 0x80728472, - 0xc0211b7c, 0x0072, - 0x80728472, 0xbf8c007f, - 0xbefc0073, 0xbefe006e, - 0xbeff006f, 0x867375ff, - 0x03ff, 0xb9734803, - 0x867375ff, 0xf800, - 0x8f738b73, 0xb973a2c3, - 0xb977f801, 0x8673ff71, - 0xf000, 0x8f739c73, - 0x8e739073, 0xbef60080, - 0x87767376, 0x8673ff71, - 0x0800, 0x8f739b73, - 0x8e738f73, 0x87767376, - 0x8673ff74, 0x0080, - 0x8f739773, 0xb976f807, - 0x8671ff71, 0x, - 0x86fe7e7e, 0x86ea6a6a, - 0x8f768374, 0xb976e0c2, - 0xbf82, 0xb9740002, - 0xbf8a, 0x95807370, - 0xbf81, 0x, + 0xbf8c007f, 0xbefc0073, + 0xbefe006e, 0xbeff006f, + 0x867375ff, 0x03ff, + 0xb9734803, 0x867375ff, + 0xf800, 0x8f738b73, + 0xb973a2c3, 0xb977f801, + 0x8673ff71, 0xf000, + 0x8f739c73, 0x8e739073, + 0xbef60080, 0x87767376, + 0x8673ff71, 0x0800, + 0x8f739b73, 0x8e738f73, + 0x87767376, 0x8673ff74, + 0x0080, 0x8f739773, + 0xb976f807, 0x8671ff71, + 0x, 0x86fe7e7e, + 0x86ea6a6a, 0x8f768374, + 0xb976e0c2, 0xbf82, + 0xb9740002, 0xbf8a, + 0x95807370, 0xbf81, }; static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf820248, - 0xb8f8f802, 0x89788678, - 0xb8eef801, 0x866eff6e, - 0x0800, 0xbf840003, + 0xbf820001, 0xbf820254, + 0xb8f8f802, 0x8978ff78, + 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, - 0xbf840016, 0xb8fbf803, + 0xbf840009, 0x866eff6d
[PATCH 2/2] drm/amdkfd: Add gfx11 trap handler
From: Jay Cornwall Based on gfx10 with following changes: - GPR_ALLOC.VGPR_SIZE field moved (and size corrected in gfx10) - s_sendmsg_rtn_b64 replaces some s_sendmsg/s_getreg - Buffer instructions no longer have direct-to-LDS modifier Signed-off-by: Jay Cornwall Reviewed-by: Laurent Morichetti --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 463 +- .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 69 ++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +- 3 files changed, 507 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 8cbdc7f519c6..60a81649cf12 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -776,7 +776,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0xe0704100, 0x705d0100, 0xe0704200, 0x705d0200, 0xe0704300, 0x705d0300, - 0xb9702a05, 0x80708170, + 0xb9703a05, 0x80708170, 0xbf0d9973, 0xbf850002, 0x8f708970, 0xbf820001, 0x8f708a70, 0xb97a1e06, @@ -855,7 +855,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0x877aff6d, 0x8000, 0xbf840040, 0x8f7b867b, 0x8f7b827b, 0xbef6037b, - 0xb9702a05, 0x80708170, + 0xb9703a05, 0x80708170, 0xbf0d9973, 0xbf850002, 0x8f708970, 0xbf820001, 0x8f708a70, 0xb97a1e06, @@ -891,7 +891,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0xbef003ff, 0x0200, 0xbeff0380, 0xbf820003, 0xbef003ff, 0x0400, - 0xbeff03c1, 0xb97b2a05, + 0xbeff03c1, 0xb97b3a05, 0x807b817b, 0x8f7b827b, 0x907c9973, 0x877c817c, 0xbf06817c, 0xbf850017, @@ -939,7 +939,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0xb96f4306, 0x876fc16f, 0xbf840029, 0x8f6f866f, 0x8f6f826f, 0xbef6036f, - 0xb9782a05, 0x80788178, + 0xb9783a05, 0x80788178, 0xbf0d9972, 0xbf850002, 0x8f788978, 0xbf820001, 0x8f788a78, 0xb96e1e06, @@ -962,7 +962,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0x907c9972, 0x877c817c, 0xbf06817c, 0xbf850002, 0xbeff0380, 0xbf820001, - 0xbeff03c1, 0xb96f2a05, + 0xbeff03c1, 0xb96f3a05, 0x806f816f, 0x8f6f826f, 0x907c9972, 0x877c817c, 0xbf06817c, 0xbf850024, @@ -1010,7 +1010,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0x6e5d0100, 0xe0304200, 0x6e5d0200, 0xe0304300, 0x6e5d0300, 0xbf8c3f70, - 0xb9782a05, 0x80788178, + 0xb9783a05, 0x80788178, 0xbf0d9972, 0xbf850002, 0x8f788978, 0xbf820001, 0x8f788a78, 0xb96e1e06, @@ -1037,7 +1037,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { 0xbe8c310c, 0xbe8e310e, 0xbf06807c, 0xbf84fff0, 0xba80f801, 0x, - 0xbf8a, 0xb9782a05, + 0xbf8a, 0xb9783a05, 0x80788178, 0xbf0d9972, 0xbf850002, 0x8f788978, 0xbf820001, 0x8f788a78, @@ -2261,7 +2261,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0xbf8a, 0x877aff6d, 0x8000, 0xbf840040, 0x8f7b867b, 0x8f7b827b, - 0xbef6037b, 0xb9702a05, + 0xbef6037b, 0xb9703a05, 0x80708170, 0xbf0d9973, 0xbf850002, 0x8f708970, 0xbf820001, 0x8f708a70, @@ -2298,7 +2298,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0x0200, 0xbeff0380, 0xbf820003, 0xbef003ff, 0x0400, 0xbeff03c1, - 0xb97b2a05, 0x807b817b, + 0xb97b3a05, 0x807b817b, 0x8f7b827b, 0x907c9973, 0x877c817c, 0xbf06817c, 0xbf850017, 0xbef603ff, @@ -2345,7 +2345,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0xbeff03c1, 0xb96f4306, 0x876fc16f, 0xbf840029, 0x8f6f866f, 0x8f6f826f, - 0xbef6036f, 0xb9782a05, + 0xbef6036f, 0xb9783a05, 0x80788178, 0xbf0d9972, 0xbf850002, 0x8f788978, 0xbf820001, 0x8f788a78, @@ -2369,7 +2369,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0x877c817c, 0xbf06817c, 0xbf850002, 0xbeff0380, 0xbf820001, 0xbeff03c1, - 0xb96f2a05, 0x806f816f, + 0xb96f3a05, 0x806f816f, 0x8f6f826f, 0x907c9972, 0x877c817c, 0xbf06817c, 0xbf850024, 0xbef603ff, @@ -2416,7 +2416,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0xe0304100, 0x6e5d0100, 0xe0304200, 0x6e5d0200, 0xe0304300, 0x6e5d0300, - 0xbf8c3f70, 0xb9782a05, + 0xbf8c3f70, 0xb9783a05, 0x80788178, 0xbf0d9972, 0xbf850002, 0x8f788978, 0xbf820001, 0x8f788a78, @@ -2444,7 +2444,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0xbe8e310e, 0xbf06807c, 0xbf84fff0, 0xba80f801, 0x, 0xbf8a, - 0xb9782a05, 0x80788178, + 0xb9783a05, 0x80788178, 0xbf0d9972, 0xbf850002, 0x8f788978, 0xbf820001,
Re: [PATCH] drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too
On 2022-04-14 04:19, Lang Yu wrote: The idea is from commit a50fe7078035 ("drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran") and commit f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus"). Otherwise, we will run into problems on some ASICs when running SVM applications. Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 8 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +++- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 91f82a9ccdaf..459f59e3d0ed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1128,14 +1128,6 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, return ret; } -static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) -{ - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || - (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && - dev->adev->sdma.instance[0].fw_version >= 18) || - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); -} - static int kfd_ioctl_map_memory_to_gpu(struct file *filep, struct kfd_process *p, void *data) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 8a43def1f638..aff6f598ff2c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1328,6 +1328,14 @@ void kfd_signal_poison_consumed_event(struct kfd_dev *dev, u32 pasid); void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type); +static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) +{ + return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && + dev->adev->sdma.instance[0].fw_version >= 18) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); +} + It is a cosmetic change for function kfd_flush_tlb_after_unmap, and not related to the topic. You can separate that into another patch. Regards, Eric bool kfd_is_locked(void); /* Compute profile */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 459fa07a3bcc..5afe216cf099 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1229,7 +1229,9 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start, if (r) break; } - kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); + + if (kfd_flush_tlb_after_unmap(pdd->dev)) + kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); } return r;
Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Vega20
Hi Guchun, SDMA FW team confirms MI50/VG20 doesn't have the same bug as MI100, which cases asic hang issue when running RVS test. If this change makes KFDMemoryTest failed, please fill a Jira and assign to me. Thanks, Eric On 2022-02-07 08:01, Chen, Guchun wrote: [Public] Hi Eric, Are you sure that there is no FW requirement for this patch on Vega20? KFDMemory test failed by this commit. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Tuesday, January 25, 2022 4:08 AM To: Huang, JinHuiEric Cc: amd-gfx list Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Vega20 On Fri, Jan 21, 2022 at 11:17 AM Eric Huang wrote: It is to meet the requirement for memory allocation optimization on MI50. Signed-off-by: Eric Huang Assuming there is no firmware version requirement, the patch is: Acked-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 5b8ae0795c0a..d708f1a502cf 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1582,7 +1582,8 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) { return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && - dev->adev->sdma.instance[0].fw_version >= 18); + dev->adev->sdma.instance[0].fw_version >= 18) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); } static int kfd_ioctl_map_memory_to_gpu(struct file *filep, -- 2.25.1
[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Vega20
It is to meet the requirement for memory allocation optimization on MI50. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 5b8ae0795c0a..d708f1a502cf 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1582,7 +1582,8 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) { return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && - dev->adev->sdma.instance[0].fw_version >= 18); + dev->adev->sdma.instance[0].fw_version >= 18) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); } static int kfd_ioctl_map_memory_to_gpu(struct file *filep, -- 2.25.1
Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
On 2022-01-19 09:50, Russell, Kent wrote: [AMD Official Use Only] -Original Message- From: Kuehling, Felix Sent: Tuesday, January 18, 2022 7:16 PM To: Russell, Kent ; Huang, JinHuiEric ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent: One question inline *KENT RUSSELL*** Sr. Software Engineer | Linux Compute Kernel 1 Commerce Valley Drive East Markham, ON L3T 7X6 *O*+(1) 289-695-2122**| Ext 72122 *From:* amd-gfx on behalf of Felix Kuehling *Sent:* Tuesday, January 18, 2022 6:36 PM *To:* Huang, JinHuiEric ; amd-gfx@lists.freedesktop.org *Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang: SDMA FW fixes the hang issue for adding heavy-weight TLB flush on Arcturus, so we can enable it. Signed-off-by: Eric Huang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 -- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a64cbbd943ba..acb4fd973e60 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( true); ret = unreserve_bo_and_vms(, false, false); - /* Only apply no TLB flush on Aldebaran to - * workaround regressions on other Asics. - */ - if (table_freed && (adev->asic_type != CHIP_ALDEBARAN)) - *table_freed = true; - goto out; out_unreserve: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index b570c0454ce9..485d4c52c7de 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1596,6 +1596,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, return ret; } +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) { + return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) Do we need to add a check for sdma ver >=8 here? What's the significance of version 8 for Aldebaran? This code was working on Aldebaran without a version check before. Did we ever publicly release an SDMA firmware older than version 8 that for Aldebaran? We released v5 for Aldebaran SDMA in ROCm 4.5 . If I remember the ticket correctly, the same fix for Arcturus was required for Aldebaran and was part of SDMA v8. But Eric is obviously watching the ticket more closely than I, so I'll defer to him there. Yes. Aldebaran has the same bug as Arcturus in SDMA, but the bug doesn't cause GPU hang on Aldebaran. As Felix said heavy-weight TLB flush have been working on Aldebaran since it was enabled, so we don't need to check the version for it. Regards, Eric Kent Regards, Felix || +(KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && + dev->adev->sdma.instance[0].fw_version >= 18); +} + static int kfd_ioctl_map_memory_to_gpu(struct file *filep, struct kfd_process *p, void *data) { @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep, } /* Flush TLBs after waiting for the page table updates to complete */ - if (table_freed) { + if (table_freed || !kfd_flush_tlb_after_unmap(dev)) { for (i = 0; i < args->n_devices; i++) { peer = kfd_device_by_id(devices_arr[i]); if (WARN_ON_ONCE(!peer)) @@ -1806,7 +1812,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, } mutex_unlock(>mutex); - if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) { + if (kfd_flush_tlb_after_unmap(dev)) { err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) mem, true); if (err) {
[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
SDMA FW fixes the hang issue for adding heavy-weight TLB flush on Arcturus, so we can enable it. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 -- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a64cbbd943ba..acb4fd973e60 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( true); ret = unreserve_bo_and_vms(, false, false); - /* Only apply no TLB flush on Aldebaran to -* workaround regressions on other Asics. -*/ - if (table_freed && (adev->asic_type != CHIP_ALDEBARAN)) - *table_freed = true; - goto out; out_unreserve: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index b570c0454ce9..485d4c52c7de 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1596,6 +1596,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, return ret; } +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) { + return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && + dev->adev->sdma.instance[0].fw_version >= 18); +} + static int kfd_ioctl_map_memory_to_gpu(struct file *filep, struct kfd_process *p, void *data) { @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep, } /* Flush TLBs after waiting for the page table updates to complete */ - if (table_freed) { + if (table_freed || !kfd_flush_tlb_after_unmap(dev)) { for (i = 0; i < args->n_devices; i++) { peer = kfd_device_by_id(devices_arr[i]); if (WARN_ON_ONCE(!peer)) @@ -1806,7 +1812,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, } mutex_unlock(>mutex); - if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) { + if (kfd_flush_tlb_after_unmap(dev)) { err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) mem, true); if (err) { -- 2.25.1
[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
SDMA FW fixes the hang issue for adding heavy-weight TLB flush on Arcturus, so we can enable it. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++--- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +++- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a64cbbd943ba..f1fed0fc31d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1892,10 +1892,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( true); ret = unreserve_bo_and_vms(, false, false); - /* Only apply no TLB flush on Aldebaran to -* workaround regressions on other Asics. + /* Only apply no TLB flush on Aldebaran and Arcturus +* to workaround regressions on other Asics. */ - if (table_freed && (adev->asic_type != CHIP_ALDEBARAN)) + if (table_freed && + (adev->asic_type != CHIP_ALDEBARAN) && + (adev->asic_type != CHIP_ARCTURUS || +adev->sdma.instance[0].fw_version < 18)) *table_freed = true; goto out; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index b570c0454ce9..0e4a76dca809 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1806,7 +1806,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, } mutex_unlock(>mutex); - if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) { + if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && +dev->adev->sdma.instance[0].fw_version >= 18)) { err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) mem, true); if (err) { -- 2.25.1
Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
I understand Alex's concern. I think usually we only check the version when a feature is only available in a specific version, and other version or newer version doesn't have. In case of FW fix, we assume the driver and FWs have to be synchronous. If we have driver backward compatibility for FWs, there must be a lot of redundant codes for FW version check. So this patch and SDMA fix will be pushed into ROCm 5.1 release branch at the same time. Regards, Eric On 2022-01-18 14:35, Alex Deucher wrote: On Tue, Jan 18, 2022 at 2:27 PM Russell, Kent wrote: [AMD Official Use Only] I think what he means is that if we are using SDMA v17, this will cause issues, won't it? Should we check that SDMA version is >=18 before enabling it? Or am I misunderstanding the fix? Yes, that was my concern. Alex Kent -Original Message- From: amd-gfx On Behalf Of Eric Huang Sent: Tuesday, January 18, 2022 2:09 PM To: Alex Deucher Cc: amd-gfx list Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus The SDMA fix is generic and not in a specific version of FW, so we don't have to check. Thanks, Eric On 2022-01-18 11:35, Alex Deucher wrote: On Tue, Jan 18, 2022 at 11:16 AM Eric Huang wrote: SDMA FW fixes the hang issue for adding heavy-weight TLB flush on Arcturus, so we can enable it. Do we need to check for a specific fw version? Alex Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +--- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a64cbbd943ba..7b24a920c12e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( true); ret = unreserve_bo_and_vms(, false, false); - /* Only apply no TLB flush on Aldebaran to -* workaround regressions on other Asics. + /* Only apply no TLB flush on Aldebaran and Arcturus +* to workaround regressions on other Asics. */ - if (table_freed && (adev->asic_type != CHIP_ALDEBARAN)) + if (table_freed && + (adev->asic_type != CHIP_ALDEBARAN) && + (adev->asic_type != CHIP_ARCTURUS)) *table_freed = true; goto out; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index b570c0454ce9..ef4d676cc71c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, } mutex_unlock(>mutex); - if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) { + if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) { err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) mem, true); if (err) { -- 2.25.1