[PATCH 21/33] drm/amdkfd: add debug trap enabled flag to tma

2023-05-25 Thread Jonathan Kim
From: Jay Cornwall Trap handler behavior will differ when a debugger is attached. Make the debug trap flag available in the trap handler TMA. Update it when the debug trap ioctl is invoked. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim Reviewed

[PATCH 30/33] drm/amdkfd: add debug query exception info operation

2023-05-25 Thread Jonathan Kim
of clearing the target exception on query. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++ 3 files changed, 133

[PATCH 33/33] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-05-25 Thread Jonathan Kim
Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git

[PATCH 20/33] drm/amdkfd: add runtime enable operation

2023-05-25 Thread Jonathan Kim
Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 143 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 4 files changed, 150 insertions(+), 4

[PATCH 28/33] drm/amdkfd: add debug set flags operation

2023-05-25 Thread Jonathan Kim
cise at the cost of performance. This setting is not permitted on debug devices that support only a global setting of this option. Return the previous set flags to the debugger as well. v2: fixup with new kfd_node struct reference mes checks Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/am

[PATCH 25/33] drm/amdkfd: add debug wave launch mode operation

2023-05-25 Thread Jonathan Kim
Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. v2: fixup with new kfd_node struct reference for mes check Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu

[PATCH 22/33] drm/amdkfd: update process interrupt handling for debug events

2023-05-25 Thread Jonathan Kim
. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. v2: use new kfd_node struct in prototypes. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH 26/33] drm/amdkfd: add debug suspend and resume process queues operation

2023-05-25 Thread Jonathan Kim
suspend or resume queues). v2: fixup new kfd_node struct reference for mes fw check. also fixup missing EC_QUEUE_NEW flagging on newly created queue. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + drivers

[PATCH 23/33] drm/amdkfd: add debug set exceptions enabled operation

2023-05-25 Thread Jonathan Kim
The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 36

[PATCH 31/33] drm/amdkfd: add debug queue snapshot operation

2023-05-25 Thread Jonathan Kim
. Also allow the debugger to clear exceptions when doing a snapshot. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 + .../drm/amd/amdkfd/kfd_device_queue_manager.h

[PATCH 24/33] drm/amdkfd: add debug wave launch override operation

2023-05-25 Thread Jonathan Kim
be overridden or fully replaced. In order for the debugger to know what is permissible, returned the supported override mask back to the debugger along with the previously enable overrides. v2: fixup with new kfd_node struct reference for mes check Signed-off-by: Jonathan Kim --- .../drm/amd

[PATCH 27/33] drm/amdkfd: add debug set and clear address watch points operation

2023-05-25 Thread Jonathan Kim
watch points are allocated or not. v2: fixup with new kfd_node struct reference for mes and watch point checks Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 51 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu

[PATCH 32/33] drm/amdkfd: add debug device snapshot operation

2023-05-25 Thread Jonathan Kim
a subsequent successful call. v2: add num_xcc to device snapshot and fixup new kfd_node reference Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 73 drivers/gpu/drm/amd/amdkfd/kfd_debug.h

[PATCH 18/33] drm/amdkfd: add raise exception event function

2023-05-25 Thread Jonathan Kim
. For memory violation exceptions, extra exception data will be saved. The debugger will be able to query the saved exception states by query operation that will be provided by follow up patches. v2: use new kfd_node struct in prototype. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd

[PATCH 14/33] drm/amdgpu: prepare map process for multi-process debug devices

2023-05-25 Thread Jonathan Kim
. v2: spot fixup new kfd_node references Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 ++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 51 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 ++ .../drm/amd/amdkfd/kfd_packet_manager_v9.c

[PATCH 29/33] drm/amdkfd: add debug query event operation

2023-05-25 Thread Jonathan Kim
Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 07/33] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
changing the implicit wait count setting. Once set, resume all work. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 116 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

[PATCH 17/33] drm/amdkfd: apply trap workaround for gfx11

2023-05-25 Thread Jonathan Kim
engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 15/33] drm/amdgpu: expose debug api for mes

2023-05-25 Thread Jonathan Kim
. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++ drivers/gpu/drm/amd/include

[PATCH 10/33] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
the required register values that the HWS needs to write on debug enable and disable. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 42 ++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH 06/33] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
inheritence of that mode is upheld. Also ensure that exception overrides are reset to their original state prior to debug enable or disable. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 92 +++ .../gpu/drm/amd/amdgpu

[PATCH 12/33] drm/amdgpu: add configurable grace period for unmap queues

2023-05-25 Thread Jonathan Kim
. v2: add null grace period function pointers to VI packet manager. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 .../drm/amd/amdgpu

[PATCH 09/33] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
will be fixed for GFX11 onwards. Also remove a bunch of deprecated misplaced references for GFX10.3. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 96 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 28 .../drm/amd

[PATCH 11/33] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode for GFX11. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 38 +++ 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 16/33] drm/amdkfd: add per process hw trap enable and disable functions

2023-05-25 Thread Jonathan Kim
functions are implemented in a follow up patch. v2: spot fix with new kfd_node references Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 148 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 29

[PATCH 19/33] drm/amdkfd: add send exception operation

2023-05-25 Thread Jonathan Kim
. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

[PATCH 04/33] drm/amdgpu: add kgd hw debug mode setting interface

2023-05-25 Thread Jonathan Kim
Introduce the require KGD debug calls that will execute hardware debug mode setting. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/include

[PATCH 08/33] drm/amdkfd: fix kfd_suspend_all_processes

2023-05-25 Thread Jonathan Kim
Flush delayed restore work in kfd_suspend_all_queues instead of cancelling. Cancelling the work before it runs results in the queues becoming permanently disabled. Flushing the work ensures that the queue suspend/resume state stays balanced. Signed-off-by: Jonathan Kim Reviewed-by: Felix

[PATCH 05/33] drm/amdgpu: setup hw debug registers on driver initialization

2023-05-25 Thread Jonathan Kim
rder to correctly set this up, set the special reserved CP bit by default whenever the MQD is initailized. v2: add missing 0-init of SPI_GDBG_TRAP_DATA0/1 Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

[PATCH 13/33] drm/amdkfd: prepare map process for single process debug devices

2023-05-25 Thread Jonathan Kim
SET_RESOUCES so that a debugged process will never migrate away from its pinned VMID. The KFD is responsible for reserving and releasing this pinned VMID accordingly whenever the debugger attaches and detaches respectively. v2: spot fix ups using new kfd_node references Signed-off-by: Jonathan Kim

[PATCH 01/33] drm/amdkfd: add debug and runtime enable interface

2023-05-25 Thread Jonathan Kim
coordinates exception handling with the HSA runtime. Usage is available in the kern docs at uapi/linux/kfd_ioctl.h. v2: add num_xcc to device snapshot entry. fixup missing EC_QUEUE_PACKET_RESERVED mask. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 48 ++ include

[PATCH 03/33] drm/amdkfd: prepare per-process debug enable and disable

2023-05-25 Thread Jonathan Kim
events will notify the debugger through a pollable FIFO file descriptor that the debugger provides to the KFD to manage. Finally on process termination of either the debugger or the target, debugging must be disabled if it has not been done so. Signed-off-by: Jonathan Kim Reviewed-by: Felix

[PATCH 02/33] drm/amdkfd: display debug capabilities

2023-05-25 Thread Jonathan Kim
-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++ include/uapi/linux/kfd_sysfs.h| 15 3 files changed, 117 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH 33/34] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-03-27 Thread Jonathan Kim
Bump the minor version to declare debugging capability is now available. v2: bump to 1.13 after upstream rebase. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files changed, 2

[PATCH 28/34] drm/amdkfd: add debug set flags operation

2023-03-27 Thread Jonathan Kim
flag for now. v2: add gfx11 support. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 58 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 1 + 3 files changed, 61 insertions(+) diff --git a/drivers/gpu

[PATCH 32/34] drm/amdkfd: add debug device snapshot operation

2023-03-27 Thread Jonathan Kim
for queue and device snapshot. change device snapshot implementation to match queue snapshot implementation. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 72 drivers

[PATCH 25/34] drm/amdkfd: add debug wave launch mode operation

2023-03-27 Thread Jonathan Kim
and remove deprecated launch mode options Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 12 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 25 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h

[PATCH 34/34] drm/amdkfd: optimize gfx off enable toggle for debugging

2023-03-27 Thread Jonathan Kim
access issues. Remove KFD GFX OFF enable toggle clutter by moving these calls into the KGD debug calls themselves. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 7 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 33 ++- .../gpu/drm/amd/amdgpu

[PATCH 06/34] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
lock renaming. add comments to explain ignored arguments for debug trap enable and disable. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 92 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 9 ++ 2 files changed

[PATCH 30/34] drm/amdkfd: add debug query exception info operation

2023-03-27 Thread Jonathan Kim
of clearing the target exception on query. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++ 3 files changed, 133

[PATCH 24/34] drm/amdkfd: add debug wave launch override operation

2023-03-27 Thread Jonathan Kim
v3: v2 was reviewed but requesting re-review for GFX11 added supported. v2: switch unsupported override mode return from EPERM to EINVAL to support unique EPERM on PTRACE failure. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 47

[PATCH 09/34] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
will be fixed for GFX11 onwards. Also remove a bunch of deprecated misplaced references for GFX10.3. v2: fix 'boundaray' typo in description and added gfx10 kgd2kfd header to avoid kern bot missing prototype complaint. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu

[PATCH 29/34] drm/amdkfd: add debug query event operation

2023-03-27 Thread Jonathan Kim
Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 13/34] drm/amdkfd: prepare map process for single process debug devices

2023-03-27 Thread Jonathan Kim
-by: Jonathan Kim --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 93 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 9 ++ .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h | 5 +- 4 files changed, 111 insertions(+), 1

[PATCH 20/34] drm/amdkfd: add runtime enable operation

2023-03-27 Thread Jonathan Kim
for runtime_enable. v2: fix up hierarchy of semantics in description. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 143 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 + drivers/gpu/drm/amd/amdkfd

[PATCH 16/34] drm/amdkfd: add per process hw trap enable and disable functions

2023-03-27 Thread Jonathan Kim
fw checks. remove asic family name comments. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 148 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 29 + drivers/gpu/drm/amd/amdkfd/kfd_process.c

[PATCH 22/34] drm/amdkfd: update process interrupt handling for debug events

2023-03-27 Thread Jonathan Kim
on queue create during -ERESTARTSYS. fix up macros naming for ECODE parsing. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 16 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + drivers/gpu/drm/amd/amdkfd/Makefile | 1

[PATCH 07/34] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 116 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +- 3 files changed, 121 insertions(+), 2 deletions(-) diff --git

[PATCH 18/34] drm/amdkfd: add raise exception event function

2023-03-27 Thread Jonathan Kim
-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 104 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 +++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 + 4 files changed, 123 insertions(+) diff --git a/drivers/gpu

[PATCH 08/34] drm/amdkfd: fix kfd_suspend_all_processes for gfx941 debugging

2023-03-27 Thread Jonathan Kim
-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 1e3795e7e18d..55a4ddd35e12 100644 --- a/drivers/gpu/drm/amd/amdkfd

[PATCH 26/34] drm/amdkfd: add debug suspend and resume process queues operation

2023-03-27 Thread Jonathan Kim
suspend or resume queues). v3: update safer copy context save header v2: add gfx11/mes support. prevent header copy on suspend from overwriting user fields. simplify resume_queues function. address other nit-picks Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5

[PATCH 19/34] drm/amdkfd: add send exception operation

2023-03-27 Thread Jonathan Kim
. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. v2: missing closing brace in set workaround function got fixed in patch 17. Signed-off-by: Jonathan Kim --- .../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4

[PATCH 14/34] drm/amdgpu: prepare map process for multi-process debug devices

2023-03-27 Thread Jonathan Kim
. v3: remove unneeded comment. also add missing kfd_debug.h include in dqm file. v2: remove asic family code name comment in per vmid support check Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 ++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 51

[PATCH 27/34] drm/amdkfd: add debug set and clear address watch points operation

2023-03-27 Thread Jonathan Kim
-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 51 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 78 ++ .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 8 ++ .../drm

[PATCH 15/34] drm/amdgpu: expose debug api for mes

2023-03-27 Thread Jonathan Kim
. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++ drivers/gpu/drm/amd/include

[PATCH 23/34] drm/amdkfd: add debug set exceptions enabled operation

2023-03-27 Thread Jonathan Kim
The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 36

[PATCH 01/34] drm/amdkfd: add debug and runtime enable interface

2023-03-27 Thread Jonathan Kim
and disable). Also remove non-needed dbg flag option. Add revision and subvendor info to debug device snapshot entry. Add trap on wave start and end override option. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 48 ++ include/uapi/linux/kfd_ioctl.h | 667

[PATCH 31/34] drm/amdkfd: add debug queue snapshot operation

2023-03-27 Thread Jonathan Kim
buf_size arg to num_queues for clarity. fix minimum entry size calculation. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 + .../drm/amd/amdkfd

[PATCH 12/34] drm/amdgpu: add configurable grace period for unmap queues

2023-03-27 Thread Jonathan Kim
. v2: clarify purpose in the description of this patch Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 .../drm/amd

[PATCH 17/34] drm/amdkfd: apply trap workaround for gfx11

2023-03-27 Thread Jonathan Kim
application. disable debugging for now on gfx11 due to broken fw. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 - drivers/gpu/drm/amd/amdkfd/kfd_debug.c

[PATCH 11/34] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode for GFX11. v2: remove unneeded ioctl reference and fix types and comment formats. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 38 +++ 1 file

[PATCH 10/34] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
the required register values that the HWS needs to write on debug enable and disable. v3: fix typo and comment format kern bot complaint. add back cu occupancy that was removed by mistake. v2: add commentary on unused restore_dbg_registers for debug enable. Signed-off-by: Jonathan Kim Reviewed

[PATCH 21/34] drm/amdkfd: add debug trap enabled flag to tma

2023-03-27 Thread Jonathan Kim
flag setup on APUs Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 +++ 3 files changed

[PATCH 05/34] drm/amdgpu: setup hw debug registers on driver initialization

2023-03-27 Thread Jonathan Kim
init for gfx11. add trap on wave start and end registers for gfx11. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

[PATCH 03/34] drm/amdkfd: prepare per-process debug enable and disable

2023-03-27 Thread Jonathan Kim
there's nothing to evict. change err code to EALREADY if attaching to an already attached process. move debug disable to release worker to avoid race with disable from ioctl call. v2: relax debug trap disable and PTRACE ATTACH requirement. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd

[PATCH 02/34] drm/amdkfd: display debug capabilities

2023-03-27 Thread Jonathan Kim
. - remove asic family code name comments in firmware support checking - add gfx11 requirements in fw support checks and debug props and caps Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 -- drivers/gpu/drm/amd/amdkfd

[PATCH 04/34] drm/amdgpu: add kgd hw debug mode setting interface

2023-03-27 Thread Jonathan Kim
Introduce the require KGD debug calls that will execute hardware debug mode setting. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/include

[PATCH 09/32] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
the required register values that the HWS needs to write on debug enable and disable. v2: add commentary on unused restore_dbg_registers for debug enable. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 43 ++- 1 file changed, 41 insertions(+), 2

[PATCH 24/32] drm/amdkfd: add debug wave launch mode operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. v2: add gfx11 support and remove deprecated launch mode options Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu

[PATCH 23/32] drm/amdkfd: add debug wave launch override operation

2023-01-25 Thread Jonathan Kim
mode return from EPERM to EINVAL to support unique EPERM on PTRACE failure. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 47 ++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 55 .../drm

[PATCH 29/32] drm/amdkfd: add debug query exception info operation

2023-01-25 Thread Jonathan Kim
of clearing the target exception on query. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++ 3 files changed, 133

[PATCH 32/32] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-01-25 Thread Jonathan Kim
Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git

[PATCH 11/32] drm/amdgpu: add configurable grace period for unmap queues

2023-01-25 Thread Jonathan Kim
. v2: clarify purpose in the description of this patch Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10

[PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11

2023-01-25 Thread Jonathan Kim
engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

[PATCH 17/32] drm/amdkfd: add raise exception event function

2023-01-25 Thread Jonathan Kim
. For memory violation exceptions, extra exception data will be saved. The debugger will be able to query the saved exception states by query operation that will be provided by follow up patches. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 91

[PATCH 30/32] drm/amdkfd: add debug queue snapshot operation

2023-01-25 Thread Jonathan Kim
size calculation. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +++ .../amd

[PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation

2023-01-25 Thread Jonathan Kim
watch points are allocated or not. v3: add gfx11 support. cleanup gfx9 kgd calls to set and clear address watch. use per device spinlock to set watch points. fixup runlist refresh calls on set/clear address watch. v2: change dev_id arg to gpu_id for consistency Signed-off-by: Jonathan Kim

[PATCH 31/32] drm/amdkfd: add debug device snapshot operation

2023-01-25 Thread Jonathan Kim
for queue and device snapshot. change device snapshot implementation to match queue snapshot implementation. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 72 drivers/gpu/drm/amd/amdkfd

[PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation

2023-01-25 Thread Jonathan Kim
suspend or resume queues). v2: add gfx11/mes support. prevent header copy on suspend from overwriting user fields. simplify resume_queues function. address other nit-picks Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 + drivers/gpu/drm/amd/amdgpu

[PATCH 05/32] drm/amdgpu: setup hw debug registers on driver initialization

2023-01-25 Thread Jonathan Kim
init for gfx11. add trap on wave start and end registers for gfx11. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 30 .../gpu/drm/amd/am

[PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events

2023-01-25 Thread Jonathan Kim
. fix up macros naming for ECODE parsing. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 16 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + drivers/gpu/drm/amd/amdkfd/Makefile | 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 85

[PATCH 28/32] drm/amdkfd: add debug query event operation

2023-01-25 Thread Jonathan Kim
Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 27/32] drm/amdkfd: add debug set flags operation

2023-01-25 Thread Jonathan Kim
cise at the cost of performance. This setting is not permitted on debug devices that support only a global setting of this option. Return the previous set flags to the debugger as well. v3: make precise mem op the only available flag for now. v2: add gfx11 support. Signed-off-by: Jonathan

[PATCH 20/32] drm/amdkfd: add debug trap enabled flag to tma

2023-01-25 Thread Jonathan Kim
flag setup on APUs Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 +++ 3 files changed

[PATCH 16/32] drm/amdkfd: add per process hw trap enable and disable functions

2023-01-25 Thread Jonathan Kim
functions are implemented in a follow up patch. v2: add gfx11 support. fix fw checks. remove asic family name comments. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 148 +- drivers/gpu/drm/amd

[PATCH 18/32] drm/amdkfd: add send exception operation

2023-01-25 Thread Jonathan Kim
. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim --- .../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 ++ drivers/gpu/drm/amd

[PATCH 08/32] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
will be fixed for GFX11 onwards. Also remove a bunch of deprecated misplaced references for GFX10.3. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 95 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 28 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c

[PATCH 10/32] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode for GFX11. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 39 +++ 1 file changed, 39 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c b/drivers/gpu

[PATCH 19/32] drm/amdkfd: add runtime enable operation

2023-01-25 Thread Jonathan Kim
in description. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 150 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 4 files changed, 157 insertions(+), 4

[PATCH 14/32] drm/amdgpu: expose debug api for mes

2023-01-25 Thread Jonathan Kim
. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++ drivers/gpu/drm/amd/include/mes_v11_api_def.h | 21 +++- 4

[PATCH 22/32] drm/amdkfd: add debug set exceptions enabled operation

2023-01-25 Thread Jonathan Kim
The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 36

[PATCH 01/32] drm/amdkfd: add debug and runtime enable interface

2023-01-25 Thread Jonathan Kim
snapshot entry. Add trap on wave start and end override option. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 48 ++ include/uapi/linux/kfd_ioctl.h | 663 ++- 2 files changed, 710 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable

2023-01-25 Thread Jonathan Kim
debug disable to release worker to avoid race with disable from ioctl call. v2: relax debug trap disable and PTRACE ATTACH requirement. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 88

[PATCH 07/32] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
changing the implicit wait count setting. Once set, resume all work. v2: remove flush on kfd suspend as that will be a general fix required outside of this patch series. comment on trap enable/disable ignored variables. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu.h

[PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
inheritence of that mode is upheld. Also ensure that exception overrides are reset to their original state prior to debug enable or disable. v2: remove unnecessary static srbm lock renaming. add comments to explain ignored arguments for debug trap enable and disable. Signed-off-by: Jonathan Kim

[PATCH 13/32] drm/amdgpu: prepare map process for multi-process debug devices

2023-01-25 Thread Jonathan Kim
. v2: remove asic family code name comment in per vmid support check Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 7 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 50 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 ++ .../drm/amd

[PATCH 12/32] drm/amdkfd: prepare map process for single process debug devices

2023-01-25 Thread Jonathan Kim
SET_RESOUCES so that a debugged process will never migrate away from its pinned VMID. The KFD is responsible for reserving and releasing this pinned VMID accordingly whenever the debugger attaches and detaches respectively. Signed-off-by: Jonathan Kim --- .../drm/amd/amdkfd/kfd_device_queue_manager.c

[PATCH 02/32] drm/amdkfd: display debug capabilities

2023-01-25 Thread Jonathan Kim
. - remove asic family code name comments in firmware support checking - add gfx11 requirements in fw support checks and debug props and caps Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6

[PATCH 04/32] drm/amdgpu: add kgd hw debug mode setting interface

2023-01-25 Thread Jonathan Kim
Introduce the require KGD debug calls that will execute hardware debug mode setting. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/include

[PATCH 00/32] Upstream of kernel support for AMDGPU ISA debugging

2023-01-25 Thread Jonathan Kim
AMDGPU kernel upstream support for debugging of compute ISA. Current production ROCm GDB interface for ISA debugging: https://rocmdocs.amd.com/en/latest/ROCm_Tools/ROCgdb.html WIP upstream source for ROCm GDB API, ROC Kernel and ROC Thunk can be referenced here:

  1   2   >