Re: [PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events
On 2023-01-25 14:53, Jonathan Kim wrote: The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. v3: enable gfx11 interrupts v2: fix interrupt drain on debug disable. fix interrupt drain on queue create during -ERESTARTSYS. fix up macros naming for ECODE parsing. Signed-off-by: Jonathan Kim Some indentation nit-picks inline. With those fixed, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 16 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + drivers/gpu/drm/amd/amdkfd/Makefile | 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 85 drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 6 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 +- .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 405 ++ .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c | 21 +- .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 98 - drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 12 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 47 ++ .../amd/amdkfd/kfd_process_queue_manager.c| 4 + 12 files changed, 681 insertions(+), 20 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 8816853e50c0..60c3b0449d86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -763,6 +763,22 @@ void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev, bo amdgpu_umc_poison_handler(adev, reset); } +int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev, + uint32_t *payload) +{ + int ret; + + /* Device or IH ring is not ready so bail. */ + ret = amdgpu_ih_wait_on_checkpoint_process_ts(adev, >irq.ih); + if (ret) + return ret; + + /* Send payload to fence KFD interrupts */ + amdgpu_amdkfd_interrupt(adev, payload); + + return 0; +} + bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev) { if (adev->gfx.ras && adev->gfx.ras->query_utcl2_poison_status) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 333780491867..df782274a4c8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -241,6 +241,8 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst, struct amdgpu_device *src, bool is_min); int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev, bool is_min); +int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev, + uint32_t *payload); /* Read user wptr from a specified user address space with page fault * disabled. The memory must be pinned and mapped to the hardware when diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index 747754428073..2ec8f27c5366 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -53,6 +53,7 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_events.o \ $(AMDKFD_PATH)/cik_event_interrupt.o \ $(AMDKFD_PATH)/kfd_int_process_v9.o \ + $(AMDKFD_PATH)/kfd_int_process_v10.o \ $(AMDKFD_PATH)/kfd_int_process_v11.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o \ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 16acf3d416eb..0c876172db4b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -125,6 +125,65 @@ bool kfd_dbg_ev_raise(uint64_t
[PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events
The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. v3: enable gfx11 interrupts v2: fix interrupt drain on debug disable. fix interrupt drain on queue create during -ERESTARTSYS. fix up macros naming for ECODE parsing. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 16 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + drivers/gpu/drm/amd/amdkfd/Makefile | 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 85 drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 6 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 +- .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 405 ++ .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c | 21 +- .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 98 - drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 12 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 47 ++ .../amd/amdkfd/kfd_process_queue_manager.c| 4 + 12 files changed, 681 insertions(+), 20 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 8816853e50c0..60c3b0449d86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -763,6 +763,22 @@ void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev, bo amdgpu_umc_poison_handler(adev, reset); } +int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev, + uint32_t *payload) +{ + int ret; + + /* Device or IH ring is not ready so bail. */ + ret = amdgpu_ih_wait_on_checkpoint_process_ts(adev, >irq.ih); + if (ret) + return ret; + + /* Send payload to fence KFD interrupts */ + amdgpu_amdkfd_interrupt(adev, payload); + + return 0; +} + bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev) { if (adev->gfx.ras && adev->gfx.ras->query_utcl2_poison_status) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 333780491867..df782274a4c8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -241,6 +241,8 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst, struct amdgpu_device *src, bool is_min); int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev, bool is_min); +int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev, + uint32_t *payload); /* Read user wptr from a specified user address space with page fault * disabled. The memory must be pinned and mapped to the hardware when diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index 747754428073..2ec8f27c5366 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -53,6 +53,7 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_events.o \ $(AMDKFD_PATH)/cik_event_interrupt.o \ $(AMDKFD_PATH)/kfd_int_process_v9.o \ + $(AMDKFD_PATH)/kfd_int_process_v10.o \ $(AMDKFD_PATH)/kfd_int_process_v11.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o \ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 16acf3d416eb..0c876172db4b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -125,6 +125,65 @@ bool kfd_dbg_ev_raise(uint64_t event_mask, return is_subscribed; } +/* set pending event queue entry from ring entry */ +bool kfd_set_dbg_ev_from_interrupt(struct kfd_dev *dev, +