[PATCH] drm/amdkfd: fix TLB flush after unmap for GFX9.4.2

2024-03-20 Thread Eric Huang
TLB flush after unmap accidentially was removed on
gfx9.4.2. It is to add it back.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 42d40560cd30..a81ef232fdef 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1473,7 +1473,7 @@ static inline void kfd_flush_tlb(struct 
kfd_process_device *pdd,
 
 static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev)
 {
-   return KFD_GC_VERSION(dev) > IP_VERSION(9, 4, 2) ||
+   return KFD_GC_VERSION(dev) >= IP_VERSION(9, 4, 2) ||
   (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && 
dev->sdma_fw_version >= 18) ||
   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0);
 }
-- 
2.34.1



[PATCH] amd/amdkfd: remove unused parameter

2024-02-28 Thread Eric Huang
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev),
and adev is also not used in the function
amdgpu_amdkfd_map_gtt_bo_to_gart().

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 4fb32d86cd0e..0ef223c2affb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -320,7 +320,7 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem 
*mem,
 void **kptr, uint64_t *size);
 void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem);
 
-int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
amdgpu_bo *bo);
+int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo);
 
 int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
struct dma_fence __rcu **ef);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index ef71b12062a1..bf8e6653341f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2189,13 +2189,12 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
 
 /**
  * amdgpu_amdkfd_map_gtt_bo_to_gart - Map BO to GART and increment reference 
count
- * @adev: Device to which allocated BO belongs
  * @bo: Buffer object to be mapped
  *
  * Before return, bo reference count is incremented. To release the reference 
and unpin/
  * unmap the BO, call amdgpu_amdkfd_free_gtt_mem.
  */
-int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
amdgpu_bo *bo)
+int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
 {
int ret;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 824e660283b2..f030cafc5a0a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -371,7 +371,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
goto err_wptr_map_gart;
}
 
-   err = amdgpu_amdkfd_map_gtt_bo_to_gart(dev->adev, wptr_bo);
+   err = amdgpu_amdkfd_map_gtt_bo_to_gart(wptr_bo);
if (err) {
pr_err("Failed to map wptr bo to GART\n");
goto err_wptr_map_gart;
-- 
2.34.1



Re: [PATCH] drm/amdkfd: only flush mes process context if mes support is there

2023-12-14 Thread Eric Huang



On 2023-12-13 22:19, Jonathan Kim wrote:

Fix up on mes process context flush to prevent non-mes devices from
spamming error messages or running into undefined behaviour during
process termination.

Fixes: 73204d028eb5 ("drm/amdkfd: fix mes set shader debugger process 
management")
Signed-off-by: Jonathan Kim 

Reviewed-by: Eric Huang 

---
  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 8e55e78fce4e..43eff221eae5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -87,7 +87,8 @@ void kfd_process_dequeue_from_device(struct 
kfd_process_device *pdd)
return;
  
  	dev->dqm->ops.process_termination(dev->dqm, >qpd);

-   amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr);
+   if (dev->kfd->shared_resources.enable_mes)
+   amdgpu_mes_flush_shader_debugger(dev->adev, 
pdd->proc_ctx_gpu_addr);
pdd->already_dequeued = true;
  }
  




[PATCH] drm/amdkfd: fix NULL ptr for debugger mes flush on non-mes asics

2023-12-14 Thread Eric Huang
The field adev->mes.funcs is NULL in function amdgpu_mes_flush_shader_debugger
on non-mes asics, add mes enabling check for call this func to
resolve the error.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 8e55e78fce4e..43eff221eae5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -87,7 +87,8 @@ void kfd_process_dequeue_from_device(struct 
kfd_process_device *pdd)
return;
 
dev->dqm->ops.process_termination(dev->dqm, >qpd);
-   amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr);
+   if (dev->kfd->shared_resources.enable_mes)
+   amdgpu_mes_flush_shader_debugger(dev->adev, 
pdd->proc_ctx_gpu_addr);
pdd->already_dequeued = true;
 }
 
-- 
2.34.1



Re: [PATCH] drm/amdkfd: fix mes set shader debugger process management

2023-12-12 Thread Eric Huang



On 2023-12-11 16:16, Jonathan Kim wrote:

MES provides the driver a call to explicitly flush stale process memory
within the MES to avoid a race condition that results in a fatal
memory violation.

When SET_SHADER_DEBUGGER is called, the driver passes a memory address
that represents a process context address MES uses to keep track of
future per-process calls.

Normally, MES will purge its process context list when the last queue
has been removed.  The driver, however, can call SET_SHADER_DEBUGGER
regardless of whether a queue has been added or not.

If SET_SHADER_DEBUGGER has been called with no queues as the last call
prior to process termination, the passed process context address will
still reside within MES.

On a new process call to SET_SHADER_DEBUGGER, the driver may end up
passing an identical process context address value (based on per-process
gpu memory address) to MES but is now pointing to a new allocated buffer
object during KFD process creation.  Since the MES is unaware of this,
access of the passed address points to the stale object within MES and
triggers a fatal memory violation.

The solution is for KFD to explicitly flush the process context address
from MES on process termination.

Note that the flush call and the MES debugger calls use the same MES
interface but are separated as KFD calls to avoid conflicting with each
other.

Signed-off-by: Jonathan Kim 
Tested-by: Alice Wong 

Reviewed-by: Eric Huang 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   | 31 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   | 10 +++---
  .../amd/amdkfd/kfd_process_queue_manager.c|  1 +
  drivers/gpu/drm/amd/include/mes_v11_api_def.h |  3 +-
  4 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index e544b823abf6..e98de23250dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -916,6 +916,11 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device 
*adev,
op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER;
op_input.set_shader_debugger.process_context_addr = 
process_context_addr;
op_input.set_shader_debugger.flags.u32all = flags;
+
+   /* use amdgpu mes_flush_shader_debugger instead */
+   if (op_input.set_shader_debugger.flags.process_ctx_flush)
+   return -EINVAL;
+
op_input.set_shader_debugger.spi_gdbg_per_vmid_cntl = 
spi_gdbg_per_vmid_cntl;
memcpy(op_input.set_shader_debugger.tcp_watch_cntl, tcp_watch_cntl,
sizeof(op_input.set_shader_debugger.tcp_watch_cntl));
@@ -935,6 +940,32 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device 
*adev,
return r;
  }
  
+int amdgpu_mes_flush_shader_debugger(struct amdgpu_device *adev,

+uint64_t process_context_addr)
+{
+   struct mes_misc_op_input op_input = {0};
+   int r;
+
+   if (!adev->mes.funcs->misc_op) {
+   DRM_ERROR("mes flush shader debugger is not supported!\n");
+   return -EINVAL;
+   }
+
+   op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER;
+   op_input.set_shader_debugger.process_context_addr = 
process_context_addr;
+   op_input.set_shader_debugger.flags.process_ctx_flush = true;
+
+   amdgpu_mes_lock(>mes);
+
+   r = adev->mes.funcs->misc_op(>mes, _input);
+   if (r)
+   DRM_ERROR("failed to set_shader_debugger\n");
+
+   amdgpu_mes_unlock(>mes);
+
+   return r;
+}
+
  static void
  amdgpu_mes_ring_to_queue_props(struct amdgpu_device *adev,
   struct amdgpu_ring *ring,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 894b9b133000..7d4f93fea937 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -296,9 +296,10 @@ struct mes_misc_op_input {
uint64_t process_context_addr;
union {
struct {
-   uint64_t single_memop : 1;
-   uint64_t single_alu_op : 1;
-   uint64_t reserved: 30;
+   uint32_t single_memop : 1;
+   uint32_t single_alu_op : 1;
+   uint32_t reserved: 29;
+   uint32_t process_ctx_flush: 1;
};
uint32_t u32all;
} flags;
@@ -374,7 +375,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device 
*adev,
const uint32_t *tcp_watch_cntl,
uint32_t flags,
bool trap_en);
-
+int amdgpu_mes_f

Re: [PATCH] drm/amdkfd: Copy HW exception data to user event

2023-11-17 Thread Eric Huang

On 2023-11-17 00:20, David Yat Sin wrote:

Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not
have valid data

Signed-off-by: David Yat Sin 
---
  drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 0f58be65132f..7d3db017f8d7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -880,6 +880,10 @@ static int copy_signaled_event_data(uint32_t num_events,
dst = [i].memory_exception_data;
src = >memory_exception_data;
size = sizeof(struct 
kfd_hsa_memory_exception_data);
+} else if (event->type == KFD_EVENT_TYPE_HW_EXCEPTION) 
{
+dst = [i].hw_exception_data;
+src = >hw_exception_data;
+size = sizeof(struct 
kfd_hsa_hw_exception_data);

Please use tabs for indent instead of white spaces.

Regards,
Eric

} else if (event->type == KFD_EVENT_TYPE_SIGNAL &&
waiter->event_age_enabled) {
dst = [i].signal_event_data.last_event_age;




Re: [PATCH] drm/amdkfd: Fix a race condition of vram buffer unref in svm code

2023-09-27 Thread Eric Huang



On 2023-09-26 23:00, Xiaogang.Chen wrote:

From: Xiaogang Chen 

prange->svm_bo unref can happen in both mmu callback and a callback after
migrate to system ram. Both are async call in different tasks. Sync svm_bo
unref operation to avoid random "use-after-free".

Signed-off-by: Xiaogang.Chen 
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 70aa882636ab..8e246e848018 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -637,6 +637,15 @@ void svm_range_vram_node_free(struct svm_range *prange)
  {
svm_range_bo_unref(prange->svm_bo);
prange->ttm_res = NULL;

Are above two lines not removed?

Regards,
Eric

+   /* serialize prange->svm_bo unref */
+   mutex_lock(>lock);
+   /* prange->svm_bo has not been unref */
+   if (prange->ttm_res) {
+   prange->ttm_res = NULL;
+   mutex_unlock(>lock);
+   svm_range_bo_unref(prange->svm_bo);
+   } else
+   mutex_unlock(>lock);
  }
  
  struct kfd_node *




Re: [PATCH] drm/amdkfd: fix add queue process context clear without runtime enable

2023-09-14 Thread Eric Huang



On 2023-09-12 21:52, Jonathan Kim wrote:

There are cases where HSA runtime is not enabled through the
AMDKFD_IOC_RUNTIME_ENABLE call when adding queues and the MES ADD_QUEUE
API should clear the MES process context instead of SET_SHADER_DEBUGGER.
Such examples are legacy HSA runtime builds that do not support the
current exception handling and running KFD tests.

The only time ADD_QUEUE.skip_process_ctx_clear is required is for
debugger use cases where a debugged process is always runtime enabled
when adding a queue.

Tested-by: Shikai Guo 
Signed-off-by: Jonathan Kim 

Reviewed-by: Eric Huang 


---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 6d07a5dd2648..77159b03a422 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -227,8 +227,10 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.tba_addr = qpd->tba_addr;
queue_input.tma_addr = qpd->tma_addr;
queue_input.trap_en = !kfd_dbg_has_cwsr_workaround(q->device);
-   queue_input.skip_process_ctx_clear = 
qpd->pqm->process->debug_trap_enabled ||
-
kfd_dbg_has_ttmps_always_setup(q->device);
+   queue_input.skip_process_ctx_clear =
+   qpd->pqm->process->runtime_info.runtime_state == 
DEBUG_RUNTIME_STATE_ENABLED &&
+   
(qpd->pqm->process->debug_trap_enabled ||
+
kfd_dbg_has_ttmps_always_setup(q->device));
  
  	queue_type = convert_to_mes_queue_type(q->properties.type);

if (queue_type < 0) {




Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-11 Thread Eric Huang

On 2023-08-11 09:26, Felix Kuehling wrote:

Am 2023-08-10 um 18:27 schrieb Eric Huang:
There is not UNMAP_QUEUES command sending for queue preemption 
because the queue is suspended and test is closed to the end. 
Function unmap_queue_cpsch will do nothing after that.


How do you suspend queues without sending an UNMAP_QUEUES command?
Now I understand what you mean, I was only thinking of UNMAP_QUEUES 
sending after clearing call. So MEC FW should clear the control register 
unconditionally on every UNMAP_QUEUES command. We can request it for gfx 
v9.4.3 to avoid the awkward workaround in KFD.


Thanks,
Eric


Regards,
  Felix




The workaround is new and only for gfx v9.4.2, because debugger tests 
has changed to check if all address watch points are correctly set, 
i.e. test A sets more than one watchpoint and leave, the following 
test B only sets one watchpoint, and test A's setting will cause more 
than one watchpoint event, so test B check out and report error on 
second or third watchpoint not set by itself.


Regards,
Eric

On 2023-08-10 17:56, Felix Kuehling wrote:
I think Jon is suggesting that the UNMAP_QUEUES command should clear 
the address watch registers. Requesting such a change from the the 
HWS team may take a long time.


That said, when was this workaround implemented and reviewed? Did I 
review it as part of Jon's debugger upstreaming patch series? Or did 
this come later? This patch only enables the workaround for v9.4.2.


Regards,
  Felix


On 2023-08-10 17:52, Eric Huang wrote:
The problem is the queue is suspended before clearing address watch 
call in KFD, there is not queue preemption and queue resume after 
clearing call, and the test ends. So there is not chance to send 
MAP_PROCESS to HWS. At this point FW has nothing to do. We have 
several test FWs from Tej, none of them works, so I recalled the 
kernel debug log and found out the problem.


GFX11 has different scheduler, when calling clear address watch, 
KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it 
doesn't consider if the queue is suspended. So GFX11 doesn't have 
this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch 
controls as non-valid automatically on queue preemption to avoid 
this kind of issue in the first place by design. MAP_PROCESS on 
resume will take whatever the driver requests.

GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug 
for gfx v9.4.2


I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a 
bit

different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd 
= {

 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

.program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,








Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
I will change title to "drm/amdkfd: workaround address watch clearing 
bug for gfx v9.4.2". is it OK?


Regards,
Eric

On 2023-08-10 18:25, Kim, Jonathan wrote:

[Public]

Yeah this is a recent bug so this workaround is new.  More rigorous tests 
revealed this is probably a miss on the FW side.  We explicitly requested 
UNMAP_QUEUES unconditionally invalidate watch controls during the beginning of 
design to prevent any watch point racing.

Note GFX11 MES calls are different on the surface but under the hood it's the 
same (registers get invalidated on unmap then get updated on map.  Only 
difference it's at the queue level).

I'm fine with this solution but I think it'd be good to describe this as a 
workaround somewhere (as opposed to a driver issue) so that folks aren't 
scratching their heads later on looking at code for GFX11 and up and wondering 
why we don't nuke the control setting with the KFD for those devices.

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:56 PM
To: Huang, JinHuiEric ; Kim, Jonathan
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

I think Jon is suggesting that the UNMAP_QUEUES command should clear the
address watch registers. Requesting such a change from the the HWS team
may take a long time.

That said, when was this workaround implemented and reviewed? Did I
review it as part of Jon's debugger upstreaming patch series? Or did
this come later? This patch only enables the workaround for v9.4.2.

Regards,
Felix


On 2023-08-10 17:52, Eric Huang wrote:

The problem is the queue is suspended before clearing address watch
call in KFD, there is not queue preemption and queue resume after
clearing call, and the test ends. So there is not chance to send
MAP_PROCESS to HWS. At this point FW has nothing to do. We have
several test FWs from Tej, none of them works, so I recalled the
kernel debug log and found out the problem.

GFX11 has different scheduler, when calling clear address watch, KFD
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it

doesn't

consider if the queue is suspended. So GFX11 doesn't have this issue.

Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls
as non-valid automatically on queue preemption to avoid this kind of
issue in the first place by design.  MAP_PROCESS on resume will take
whatever the driver requests.
GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for
gfx v9.4.2

I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +-

--

1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

  return watch_address_cntl;
}

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
const struct kfd2kgd_calls aldebaran_kfd2kgd = {
  .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

  .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd =

{

  .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

  .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
  .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
  .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
  .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

  .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,




Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
There is not UNMAP_QUEUES command sending for queue preemption because 
the queue is suspended and test is closed to the end. Function 
unmap_queue_cpsch will do nothing after that.


The workaround is new and only for gfx v9.4.2, because debugger tests 
has changed to check if all address watch points are correctly set, i.e. 
test A sets more than one watchpoint and leave, the following test B 
only sets one watchpoint, and test A's setting will cause more than one 
watchpoint event, so test B check out and report error on second or 
third watchpoint not set by itself.


Regards,
Eric

On 2023-08-10 17:56, Felix Kuehling wrote:
I think Jon is suggesting that the UNMAP_QUEUES command should clear 
the address watch registers. Requesting such a change from the the HWS 
team may take a long time.


That said, when was this workaround implemented and reviewed? Did I 
review it as part of Jon's debugger upstreaming patch series? Or did 
this come later? This patch only enables the workaround for v9.4.2.


Regards,
  Felix


On 2023-08-10 17:52, Eric Huang wrote:
The problem is the queue is suspended before clearing address watch 
call in KFD, there is not queue preemption and queue resume after 
clearing call, and the test ends. So there is not chance to send 
MAP_PROCESS to HWS. At this point FW has nothing to do. We have 
several test FWs from Tej, none of them works, so I recalled the 
kernel debug log and found out the problem.


GFX11 has different scheduler, when calling clear address watch, KFD 
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't 
consider if the queue is suspended. So GFX11 doesn't have this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls 
as non-valid automatically on queue preemption to avoid this kind of 
issue in the first place by design. MAP_PROCESS on resume will take 
whatever the driver requests.

GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for 
gfx v9.4.2


I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

 .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,






Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
The problem is the queue is suspended before clearing address watch call 
in KFD, there is not queue preemption and queue resume after clearing 
call, and the test ends. So there is not chance to send MAP_PROCESS to 
HWS. At this point FW has nothing to do. We have several test FWs from 
Tej, none of them works, so I recalled the kernel debug log and found 
out the problem.


GFX11 has different scheduler, when calling clear address watch, KFD 
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't 
consider if the queue is suspended. So GFX11 doesn't have this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls as 
non-valid automatically on queue preemption to avoid this kind of issue in the 
first place by design.  MAP_PROCESS on resume will take whatever the driver 
requests.
GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

 .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,




Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang

Yes. I will send out the fix for gc v9.4.3 later. Thanks for your review.

Eric

On 2023-08-10 17:02, Felix Kuehling wrote:
I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit 
different because it needs to support multiple XCCs.


That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t 
kgd_gfx_aldebaran_set_address_watch(

  return watch_address_cntl;
  }
  -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct 
amdgpu_device *adev,

-  uint32_t watch_id)
-{
-    return 0;
-}
-
  const struct kfd2kgd_calls aldebaran_kfd2kgd = {
  .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
  .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
  .set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,

  .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
  .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-    .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+    .clear_address_watch = kgd_gfx_v9_clear_address_watch,
  .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
  .build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
  .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,




[PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device 
*adev,
- uint32_t watch_id)
-{
-   return 0;
-}
-
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
.set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
-- 
2.34.1



Re: [PATCH] drm/amdkfd: fix and enable ttmp setup for gfx11

2023-07-25 Thread Eric Huang



On 2023-07-24 15:01, Jonathan Kim wrote:

The MES cached process context must be cleared on adding any queue for
the first time.

For proper debug support, the MES will clear it's cached process context
on the first call to SET_SHADER_DEBUGGER.

This allows TTMPs to be pesistently enabled in a safe manner.

Signed-off-by: Jonathan Kim 

Reviewed-by: Eric Huang 

Regards,
Eric

---
  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 13 -
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 19 +--
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 11 ++-
  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 12 +---
  6 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
index 77ca5cbfb601..d67d003bada2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c
@@ -637,7 +637,7 @@ static uint32_t kgd_gfx_v11_disable_debug_trap(struct 
amdgpu_device *adev,
  {
uint32_t data = 0;
  
-	data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, keep_trap_enabled);

+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_EN, 0);
data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, EXCP_REPLACE, 0);
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index e0f9cf6dd8fd..42df972357e9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2755,6 +2755,16 @@ static int runtime_enable(struct kfd_process *p, 
uint64_t r_debug,
  
  		if (pdd->qpd.queue_count)

return -EEXIST;
+
+   /*
+* Setup TTMPs by default.
+* Note that this call must remain here for MES ADD QUEUE to
+* skip_process_ctx_clear unconditionally as the first call to
+* SET_SHADER_DEBUGGER clears any stale process context data
+* saved in MES.
+*/
+   if (pdd->dev->kfd->shared_resources.enable_mes)
+   kfd_dbg_set_mes_debug_mode(pdd, 
!kfd_dbg_has_cwsr_workaround(pdd->dev));
}
  
  	p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED;

@@ -2848,7 +2858,8 @@ static int runtime_disable(struct kfd_process *p)
if (!pdd->dev->kfd->shared_resources.enable_mes)
debug_refresh_runlist(pdd->dev->dqm);
else
-   kfd_dbg_set_mes_debug_mode(pdd);
+   kfd_dbg_set_mes_debug_mode(pdd,
+  
!kfd_dbg_has_cwsr_workaround(pdd->dev));
}
}
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c

index 1f82caea59ba..9ec750666382 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -344,11 +344,10 @@ static int kfd_dbg_set_workaround(struct kfd_process 
*target, bool enable)
return r;
  }
  
-int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd)

+int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd, bool sq_trap_en)
  {
uint32_t spi_dbg_cntl = pdd->spi_dbg_override | 
pdd->spi_dbg_launch_mode;
uint32_t flags = pdd->process->dbg_flags;
-   bool sq_trap_en = !!spi_dbg_cntl || 
!kfd_dbg_has_cwsr_workaround(pdd->dev);
  
  	if (!kfd_dbg_is_per_vmid_supported(pdd->dev))

return 0;
@@ -432,7 +431,7 @@ int kfd_dbg_trap_clear_dev_address_watch(struct 
kfd_process_device *pdd,
if (!pdd->dev->kfd->shared_resources.enable_mes)
r = debug_map_and_unlock(pdd->dev->dqm);
else
-   r = kfd_dbg_set_mes_debug_mode(pdd);
+   r = kfd_dbg_set_mes_debug_mode(pdd, true);
  
  	kfd_dbg_clear_dev_watch_id(pdd, watch_id);
  
@@ -474,7 +473,7 @@ int kfd_dbg_trap_set_dev_address_watch(struct kfd_process_device *pdd,

if (!pdd->dev->kfd->shared_resources.enable_mes)
r = debug_map_and_unlock(pdd->dev->dqm);
else
-   r = kfd_dbg_set_mes_debug_mode(pdd);
+   r = kfd_dbg_set_mes_debug_mode(pdd, true);
  
  	/* HWS is broken so no point in HW rollback but release the watchpoint anyways */

if (r)
@@ -516,7 +515,7 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target, 
uint32_t *flags)
if (!pdd->dev->kfd->shared_resources.enable_mes)
r = debug_refresh_runlist(pdd->dev->dqm);

[PATCH] drm/amdgpu: enable trap of each kfd vmid for gfx v9.4.3

2023-07-25 Thread Eric Huang
To setup ttmp on as default for gfx v9.4.3 in IP hw init.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 86a84a0970f0..9a90fd187909 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -898,6 +898,7 @@ static void gfx_v9_4_3_xcc_init_compute_vmid(struct 
amdgpu_device *adev,
int i;
uint32_t sh_mem_config;
uint32_t sh_mem_bases;
+   uint32_t data;
 
/*
 * Configure apertures:
@@ -917,6 +918,11 @@ static void gfx_v9_4_3_xcc_init_compute_vmid(struct 
amdgpu_device *adev,
/* CP and shaders */
WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regSH_MEM_CONFIG, 
sh_mem_config);
WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regSH_MEM_BASES, 
sh_mem_bases);
+
+   /* Enable trap for each kfd vmid. */
+   data = RREG32_SOC15(GC, GET_INST(GC, xcc_id), 
regSPI_GDBG_PER_VMID_CNTL);
+   data = REG_SET_FIELD(data, SPI_GDBG_PER_VMID_CNTL, TRAP_EN, 1);
+   WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), 
regSPI_GDBG_PER_VMID_CNTL, data);
}
soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, xcc_id));
mutex_unlock(>srbm_mutex);
-- 
2.34.1



Re: [PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-11 Thread Eric Huang



On 2023-07-11 14:38, Felix Kuehling wrote:


On 2023-07-11 10:28, Eric Huang wrote:

Read/write grace period from/to first xcc instance of
xcp in kfd node.

Signed-off-by: Eric Huang 
---
  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 ---
  .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
  .../drm/amd/amdkfd/kfd_packet_manager_v9.c    |  8 ---
  3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 31cac1fd0d58..9000c4b778fd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1619,10 +1619,14 @@ static int initialize_cpsch(struct 
device_queue_manager *dqm)

    init_sdma_bitmaps(dqm);
  -    if (dqm->dev->kfd2kgd->get_iq_wait_times)
+    if (dqm->dev->kfd2kgd->get_iq_wait_times) {
+    u32 first_inst = dqm->dev->xcp->id *
+ dqm->dev->adev->gfx.num_xcc_per_xcp;
dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
-    >wait_times,
-    ffs(dqm->dev->xcc_mask) - 1);
+    >wait_times[first_inst],
+    first_inst);
+    }
+
  return 0;
  }
  @@ -1675,13 +1679,16 @@ static int start_cpsch(struct 
device_queue_manager *dqm)

  grace_period);
  if (retval)
  pr_err("Setting grace timeout failed\n");
-    else if (dqm->dev->kfd2kgd->build_grace_period_packet_info)
+    else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) {
+    u32 first_inst = dqm->dev->xcp->id *
+ dqm->dev->adev->gfx.num_xcc_per_xcp;
  /* Update dqm->wait_times maintained in software */
dqm->dev->kfd2kgd->build_grace_period_packet_info(
-    dqm->dev->adev, dqm->wait_times,
+    dqm->dev->adev, dqm->wait_times[first_inst],
  grace_period, _offset,
-    >wait_times,
-    ffs(dqm->dev->xcc_mask) - 1);
+    >wait_times[first_inst],
+    first_inst);
+    }
  }
    dqm_unlock(dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h

index 7dd4b177219d..45959c33b944 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
  /* used for GFX 9.4.3 only */
  uint32_t    current_logical_xcc_start;
  -    uint32_t    wait_times;
+    uint32_t    wait_times[MAX_XCP];


Why do you need an array here, if it only saves the wait times in one 
of the array entries [first_inst]?
That is my misunderstanding for XCP. Each DPM should be associated to 1 
XCP. I thought DPM has multiple XCPs.


Thanks,
Eric


Regards,
  Felix



    wait_queue_head_t    destroy_wait;
  };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c

index 8fda16e6fee6..960404a6379b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -292,17 +292,19 @@ static int pm_set_grace_period_v9(struct 
packet_manager *pm,

  struct pm4_mec_write_data_mmio *packet;
  uint32_t reg_offset = 0;
  uint32_t reg_data = 0;
+    uint32_t first_inst = pm->dqm->dev->xcp->id *
+ pm->dqm->dev->adev->gfx.num_xcc_per_xcp;
pm->dqm->dev->kfd2kgd->build_grace_period_packet_info(
  pm->dqm->dev->adev,
-    pm->dqm->wait_times,
+    pm->dqm->wait_times[first_inst],
  grace_period,
  _offset,
  _data,
-    0);
+    first_inst);
    if (grace_period == USE_DEFAULT_GRACE_PERIOD)
-    reg_data = pm->dqm->wait_times;
+    reg_data = pm->dqm->wait_times[first_inst];
    packet = (struct pm4_mec_write_data_mmio *)buffer;
  memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio));




[PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-11 Thread Eric Huang
Read/write grace period from/to first xcc instance of
xcp in kfd node.

Signed-off-by: Eric Huang 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 ---
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c|  8 ---
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 31cac1fd0d58..9000c4b778fd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1619,10 +1619,14 @@ static int initialize_cpsch(struct device_queue_manager 
*dqm)
 
init_sdma_bitmaps(dqm);
 
-   if (dqm->dev->kfd2kgd->get_iq_wait_times)
+   if (dqm->dev->kfd2kgd->get_iq_wait_times) {
+   u32 first_inst = dqm->dev->xcp->id *
+dqm->dev->adev->gfx.num_xcc_per_xcp;
dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
-   >wait_times,
-   ffs(dqm->dev->xcc_mask) - 1);
+   >wait_times[first_inst],
+   first_inst);
+   }
+
return 0;
 }
 
@@ -1675,13 +1679,16 @@ static int start_cpsch(struct device_queue_manager *dqm)
grace_period);
if (retval)
pr_err("Setting grace timeout failed\n");
-   else if (dqm->dev->kfd2kgd->build_grace_period_packet_info)
+   else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) {
+   u32 first_inst = dqm->dev->xcp->id *
+dqm->dev->adev->gfx.num_xcc_per_xcp;
/* Update dqm->wait_times maintained in software */
dqm->dev->kfd2kgd->build_grace_period_packet_info(
-   dqm->dev->adev, dqm->wait_times,
+   dqm->dev->adev, 
dqm->wait_times[first_inst],
grace_period, _offset,
-   >wait_times,
-   ffs(dqm->dev->xcc_mask) - 1);
+   >wait_times[first_inst],
+   first_inst);
+   }
}
 
dqm_unlock(dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7dd4b177219d..45959c33b944 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
/* used for GFX 9.4.3 only */
uint32_tcurrent_logical_xcc_start;
 
-   uint32_twait_times;
+   uint32_twait_times[MAX_XCP];
 
wait_queue_head_t   destroy_wait;
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8fda16e6fee6..960404a6379b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -292,17 +292,19 @@ static int pm_set_grace_period_v9(struct packet_manager 
*pm,
struct pm4_mec_write_data_mmio *packet;
uint32_t reg_offset = 0;
uint32_t reg_data = 0;
+   uint32_t first_inst = pm->dqm->dev->xcp->id *
+ pm->dqm->dev->adev->gfx.num_xcc_per_xcp;
 
pm->dqm->dev->kfd2kgd->build_grace_period_packet_info(
pm->dqm->dev->adev,
-   pm->dqm->wait_times,
+   pm->dqm->wait_times[first_inst],
grace_period,
_offset,
_data,
-   0);
+   first_inst);
 
if (grace_period == USE_DEFAULT_GRACE_PERIOD)
-   reg_data = pm->dqm->wait_times;
+   reg_data = pm->dqm->wait_times[first_inst];
 
packet = (struct pm4_mec_write_data_mmio *)buffer;
memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio));
-- 
2.34.1



Re: [PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-10 Thread Eric Huang

OK. Mukul, I will resend this patch based on top of yours.

Regards,
Eric

On 2023-07-10 18:24, Joshi, Mukul wrote:

[AMD Official Use Only - General]


-Original Message-
From: amd-gfx  On Behalf Of Eric
Huang
Sent: Monday, July 10, 2023 3:46 PM
To: amd-gfx@lists.freedesktop.org
Cc: Huang, JinHuiEric ; Kim, Jonathan

Subject: [PATCH] drm/amdkfd: enable grace period for xcp instance

Caution: This message originated from an External Source. Use proper caution
when opening attachments, clicking links, or responding.


Read/write grace period from/to first xcc instance of xcp in kfd node.


Hi Eric,

My patch, "drm/amdkfd: Update CWSR grace period for GFX9.4.3", which got missed 
during the merge
should handle most of what you are trying to do.
I will push that patch. Please add on top if there is anything missing.

Hope that works for you.

Thanks,
Mukul


Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11
---  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |
2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c| 10 +++---
  3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index de83eccdd9de..a95bcb91dc09 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1619,10 +1619,15 @@ static int initialize_cpsch(struct
device_queue_manager *dqm)

 init_sdma_bitmaps(dqm);

-   if (dqm->dev->kfd2kgd->get_iq_wait_times)
+   if (dqm->dev->kfd2kgd->get_iq_wait_times) {
+   u32 inst = ffs(dqm->dev->xcc_mask &
+  (1UL <<
+  dqm->dev->xcp->id *
+  dqm->dev->adev->gfx.num_xcc_per_xcp)) -
+ 1;
 dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
-   >wait_times,
-   0);
+   >wait_times[inst],
+   inst);
+   }
 return 0;
  }

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7dd4b177219d..45959c33b944 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
 /* used for GFX 9.4.3 only */
 uint32_tcurrent_logical_xcc_start;

-   uint32_twait_times;
+   uint32_twait_times[MAX_XCP];

 wait_queue_head_t   destroy_wait;
  };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8fda16e6fee6..dd50164c16cd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -292,17 +292,21 @@ static int pm_set_grace_period_v9(struct
packet_manager *pm,
 struct pm4_mec_write_data_mmio *packet;
 uint32_t reg_offset = 0;
 uint32_t reg_data = 0;
+   uint32_t inst = ffs(pm->dqm->dev->xcc_mask &
+   (1UL <<
+   pm->dqm->dev->xcp->id *
+   pm->dqm->dev->adev->gfx.num_xcc_per_xcp)) -
+ 1;

 pm->dqm->dev->kfd2kgd->build_grace_period_packet_info(
 pm->dqm->dev->adev,
-   pm->dqm->wait_times,
+   pm->dqm->wait_times[inst],
 grace_period,
 _offset,
 _data,
-   0);
+   inst);

 if (grace_period == USE_DEFAULT_GRACE_PERIOD)
-   reg_data = pm->dqm->wait_times;
+   reg_data = pm->dqm->wait_times[inst];

 packet = (struct pm4_mec_write_data_mmio *)buffer;
 memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio));
--
2.34.1




[PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-10 Thread Eric Huang
Read/write grace period from/to first xcc instance of
xcp in kfd node.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 ---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c| 10 +++---
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index de83eccdd9de..a95bcb91dc09 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1619,10 +1619,15 @@ static int initialize_cpsch(struct device_queue_manager 
*dqm)
 
init_sdma_bitmaps(dqm);
 
-   if (dqm->dev->kfd2kgd->get_iq_wait_times)
+   if (dqm->dev->kfd2kgd->get_iq_wait_times) {
+   u32 inst = ffs(dqm->dev->xcc_mask &
+  (1UL <<
+  dqm->dev->xcp->id *
+  dqm->dev->adev->gfx.num_xcc_per_xcp)) - 1;
dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
-   >wait_times,
-   0);
+   >wait_times[inst],
+   inst);
+   }
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7dd4b177219d..45959c33b944 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
/* used for GFX 9.4.3 only */
uint32_tcurrent_logical_xcc_start;
 
-   uint32_twait_times;
+   uint32_twait_times[MAX_XCP];
 
wait_queue_head_t   destroy_wait;
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8fda16e6fee6..dd50164c16cd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -292,17 +292,21 @@ static int pm_set_grace_period_v9(struct packet_manager 
*pm,
struct pm4_mec_write_data_mmio *packet;
uint32_t reg_offset = 0;
uint32_t reg_data = 0;
+   uint32_t inst = ffs(pm->dqm->dev->xcc_mask &
+   (1UL <<
+   pm->dqm->dev->xcp->id *
+   pm->dqm->dev->adev->gfx.num_xcc_per_xcp)) - 1;
 
pm->dqm->dev->kfd2kgd->build_grace_period_packet_info(
pm->dqm->dev->adev,
-   pm->dqm->wait_times,
+   pm->dqm->wait_times[inst],
grace_period,
_offset,
_data,
-   0);
+   inst);
 
if (grace_period == USE_DEFAULT_GRACE_PERIOD)
-   reg_data = pm->dqm->wait_times;
+   reg_data = pm->dqm->wait_times[inst];
 
packet = (struct pm4_mec_write_data_mmio *)buffer;
memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio));
-- 
2.34.1



Re: [PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-07 Thread Eric Huang
Thanks for your review. The prefix name change will be contradictory 
that new functions prefix name is different with existing functions 
prefix name. Are you sure it doesn't matter?


Regards,
Eric

On 2023-07-07 19:52, Kim, Jonathan wrote:
I would change the static prefix names from kgd_gfx_ to kgd_gc_ to 
match file name and specify it as the target GC version.


With that fixed and assuming grace period instance fix ups will follow 
after, this patch and series is:


Reviewed-by: Jonathan Kim 



*From:* Huang, JinHuiEric 
*Sent:* Friday, July 7, 2023 1:46 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Kim, Jonathan ; Kim, Jonathan 
; Huang, JinHuiEric 
*Subject:* [PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for 
GC v9.4.3

From: Jonathan Kim 

Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec, i.e. xcc instance.

Signed-off-by: Jonathan Kim 
Signed-off-by: Eric Huang 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   8 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  27 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 166 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    |   3 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h    |   6 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c    |   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c    |   3 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   3 +-
 10 files changed, 213 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index 60f9e027fb66..a06a99c5d311 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,7 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_2_offset.h"
 #include "gc/gc_9_4_2_sh_mask.h"
 #include 
@@ -36,7 +37,7 @@
  * initialize the debug mode registers after it has disabled GFX off 
during the

  * debug session.
  */
-static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device 
*adev,

+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
 bool restore_dbg_registers,
 uint32_t vmid)
 {
@@ -107,7 +108,7 @@ static uint32_t 
kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device

 return data;
 }

-static uint32_t kgd_aldebaran_set_wave_launch_mode(struct 
amdgpu_device *adev,

+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
 uint8_t wave_launch_mode,
 uint32_t vmid)
 {
@@ -125,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
 uint32_t watch_address_mask,
 uint32_t watch_id,
 uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst )
 {
 uint32_t watch_address_high;
 uint32_t watch_address_low;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

new file mode 100644
index ..a7bdaf8d82dd
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,

+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR

+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR 

[PATCH 3/4] drm/amdkfd: enable watch points globally for gfx943

2023-07-07 Thread Eric Huang
From: Jonathan Kim 

Set watch points for all xcc instances on GFX943.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
Reviewed-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 24083db44724..190b03efe5ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -446,7 +446,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t *watch_id,
uint32_t watch_mode)
 {
-   int r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   uint32_t xcc_mask = pdd->dev->xcc_mask;
 
if (r)
return r;
@@ -460,14 +461,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
}
 
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch(
+   for_each_inst(xcc_id, xcc_mask)
+   pdd->watch_points[*watch_id] = 
pdd->dev->kfd2kgd->set_address_watch(
pdd->dev->adev,
watch_address,
watch_address_mask,
*watch_id,
watch_mode,
pdd->dev->vm_info.last_vmid_kfd,
-   0);
+   xcc_id);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
-- 
2.34.1



[PATCH 4/4] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-07-07 Thread Eric Huang
From: Jonathan Kim 

Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended
MAP_PROCESS packet to support multi-process debugging.  Update the
mutli-process debug support list so that the KFD updates the runlist
on debug mode setting and that it allocates enough GTT memory during
KFD device initialization.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
Reviewed-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index a289e59ceb79..a0afc6a7b6c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
 
 static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev)
 {
-   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
-  KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0);
+   return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) ||
+   KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0));
 }
 
 void debug_event_write_work_handler(struct work_struct *work);
-- 
2.34.1



[PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-07 Thread Eric Huang
From: Jonathan Kim 

Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec, i.e. xcc instance.

Signed-off-by: Jonathan Kim 
Signed-off-by: Eric Huang 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   8 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  27 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 166 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|   3 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|   6 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   3 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   3 +-
 10 files changed, 213 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 60f9e027fb66..a06a99c5d311 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,7 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_2_offset.h"
 #include "gc/gc_9_4_2_sh_mask.h"
 #include 
@@ -36,7 +37,7 @@
  * initialize the debug mode registers after it has disabled GFX off during the
  * debug session.
  */
-static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
bool restore_dbg_registers,
uint32_t vmid)
 {
@@ -107,7 +108,7 @@ static uint32_t 
kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device
return data;
 }
 
-static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
uint8_t wave_launch_mode,
uint32_t vmid)
 {
@@ -125,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst )
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
new file mode 100644
index ..a7bdaf8d82dd
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid);
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+   uint8_t wave_launch_mode,
+   uint32_t vmid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index 5b4b7f8b92a5..543405a28b19 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -22,6 +22,7 @@
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "

[PATCH 2/4] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-07-07 Thread Eric Huang
From: Jonathan Kim 

The additional information that the KFD reports to the debugger was
destroyed when the following commit was merged:
"drm/amdkfd: convert switches to IP version checking"

Signed-off-by: Jonathan Kim 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Jonathan Kim 
Acked-by: Amber Lin 
Signed-off-by: Eric Huang 
Reviewed-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 --
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  3 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 61fc62f3e003..1a4cdee86759 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct 
kfd_topology_device *dev)
HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
-   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
-   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
+   if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3))
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3;
+   else
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2))
dev->node_props.debug_prop |=
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index cba2cd5ed9d1..dea32a9e5506 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -32,9 +32,12 @@
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32
 
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96
+#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10   7
 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT  \
(29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
+#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \
+   (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
 
 struct kfd_node_properties {
uint64_t hive_id;
-- 
2.34.1



[PATCH 0/4] Upstream debugger feature for GFX v9.4.3

2023-07-07 Thread Eric Huang
Jonathan Kim (4):
  drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3
  drm/amdkfd: restore debugger additional info for gfx v9_4_3
  drm/amdkfd: enable watch points globally for gfx943
  drm/amdkfd: add multi-process debugging support for GC v9.4.3

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   8 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  27 +++
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 166 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|   3 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|   6 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   9 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   3 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   3 +-
 13 files changed, 231 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

-- 
2.34.1



Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

2023-07-07 Thread Eric Huang



On 2023-07-07 11:56, Kim, Jonathan wrote:

[Public]


-Original Message-
From: Huang, JinHuiEric 
Sent: Friday, July 7, 2023 11:46 AM
To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance


On 2023-07-07 10:59, Kim, Jonathan wrote:

[Public]


-Original Message-
From: Huang, JinHuiEric 
Sent: Thursday, July 6, 2023 2:19 PM
To: amd-gfx@lists.freedesktop.org
Cc: Kim, Jonathan ; Huang, JinHuiEric

Subject: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

each xcc instance needs to get iq wait time and set
grace period accordingly.

Signed-off-by: Eric Huang 
---
   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  9 --
   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
   .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 32 +++---

-

   .../drm/amd/amdkfd/kfd_packet_manager_v9.c|  9 +++---
   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +-
   5 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a2bff3f01359..0f12c1989e14 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1606,6 +1606,8 @@ static int set_sched_resources(struct
device_queue_manager *dqm)

   static int initialize_cpsch(struct device_queue_manager *dqm)
   {
+ uint32_t xcc_id, xcc_mask = dqm->dev->xcc_mask;
+
pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm));

mutex_init(>lock_hidden);
@@ -1620,8 +1622,11 @@ static int initialize_cpsch(struct
device_queue_manager *dqm)
init_sdma_bitmaps(dqm);

if (dqm->dev->kfd2kgd->get_iq_wait_times)
- dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
- >wait_times, 0);
+ for_each_inst(xcc_id, xcc_mask)
+ dqm->dev->kfd2kgd->get_iq_wait_times(
+ dqm->dev->adev,
+ >wait_times[xcc_id],
+ xcc_id);
return 0;
   }

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7dd4b177219d..62a6dc8d3032 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
/* used for GFX 9.4.3 only */
uint32_tcurrent_logical_xcc_start;

- uint32_twait_times;
+ uint32_twait_times[32];

I think wait_times[16] should be sufficient.  We only get the hamming

weight of 16 bits for NUM_XCC and I believe the xcc_mask is declared as a
uint16_t in the KGD portion anyway.  We may as well align to that.

wait_queue_head_t   destroy_wait;
   };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 401096c103b2..f37ab4b6d88c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -374,27 +374,31 @@ int pm_update_grace_period(struct
packet_manager *pm, uint32_t grace_period)
   {
int retval = 0;
uint32_t *buffer, size;
+ uint32_t xcc_id, xcc_mask = pm->dqm->dev->xcc_mask;

size = pm->pmf->set_grace_period_size;

mutex_lock(>lock);

if (size) {
- kq_acquire_packet_buffer(pm->priv_queue,
- size / sizeof(uint32_t),
- (unsigned int **));
-
- if (!buffer) {
- pr_err("Failed to allocate buffer on kernel queue\n");
- retval = -ENOMEM;
- goto out;
- }
+ for_each_inst(xcc_id, xcc_mask) {
+ kq_acquire_packet_buffer(pm->priv_queue,
+ size / sizeof(uint32_t),
+ (unsigned int **));

- retval = pm->pmf->set_grace_period(pm, buffer,
grace_period);
- if (!retval)
- kq_submit_packet(pm->priv_queue);
- else
- kq_rollback_packet(pm->priv_queue);
+ if (!buffer) {
+ pr_err("Failed to allocate buffer on kernel
queue\n");
+ retval = -ENOMEM;
+ goto out;
+ }
+
+ retval = pm->pmf->set_grace_period(pm, buffer,
+ grace_period, xcc_id);
+ if (!retval)
+ kq_submit_packet(pm->priv_queue);
+   

Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

2023-07-07 Thread Eric Huang



On 2023-07-07 10:59, Kim, Jonathan wrote:

[Public]


-Original Message-
From: Huang, JinHuiEric 
Sent: Thursday, July 6, 2023 2:19 PM
To: amd-gfx@lists.freedesktop.org
Cc: Kim, Jonathan ; Huang, JinHuiEric

Subject: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

each xcc instance needs to get iq wait time and set
grace period accordingly.

Signed-off-by: Eric Huang 
---
  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  9 --
  .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 32 +++
  .../drm/amd/amdkfd/kfd_packet_manager_v9.c|  9 +++---
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +-
  5 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a2bff3f01359..0f12c1989e14 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1606,6 +1606,8 @@ static int set_sched_resources(struct
device_queue_manager *dqm)

  static int initialize_cpsch(struct device_queue_manager *dqm)
  {
+ uint32_t xcc_id, xcc_mask = dqm->dev->xcc_mask;
+
   pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm));

   mutex_init(>lock_hidden);
@@ -1620,8 +1622,11 @@ static int initialize_cpsch(struct
device_queue_manager *dqm)
   init_sdma_bitmaps(dqm);

   if (dqm->dev->kfd2kgd->get_iq_wait_times)
- dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
- >wait_times, 0);
+ for_each_inst(xcc_id, xcc_mask)
+ dqm->dev->kfd2kgd->get_iq_wait_times(
+ dqm->dev->adev,
+ >wait_times[xcc_id],
+ xcc_id);
   return 0;
  }

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7dd4b177219d..62a6dc8d3032 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
   /* used for GFX 9.4.3 only */
   uint32_tcurrent_logical_xcc_start;

- uint32_twait_times;
+ uint32_twait_times[32];

I think wait_times[16] should be sufficient.  We only get the hamming weight of 
16 bits for NUM_XCC and I believe the xcc_mask is declared as a uint16_t in the 
KGD portion anyway.  We may as well align to that.


   wait_queue_head_t   destroy_wait;
  };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 401096c103b2..f37ab4b6d88c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -374,27 +374,31 @@ int pm_update_grace_period(struct
packet_manager *pm, uint32_t grace_period)
  {
   int retval = 0;
   uint32_t *buffer, size;
+ uint32_t xcc_id, xcc_mask = pm->dqm->dev->xcc_mask;

   size = pm->pmf->set_grace_period_size;

   mutex_lock(>lock);

   if (size) {
- kq_acquire_packet_buffer(pm->priv_queue,
- size / sizeof(uint32_t),
- (unsigned int **));
-
- if (!buffer) {
- pr_err("Failed to allocate buffer on kernel queue\n");
- retval = -ENOMEM;
- goto out;
- }
+ for_each_inst(xcc_id, xcc_mask) {
+ kq_acquire_packet_buffer(pm->priv_queue,
+ size / sizeof(uint32_t),
+ (unsigned int **));

- retval = pm->pmf->set_grace_period(pm, buffer,
grace_period);
- if (!retval)
- kq_submit_packet(pm->priv_queue);
- else
- kq_rollback_packet(pm->priv_queue);
+ if (!buffer) {
+ pr_err("Failed to allocate buffer on kernel
queue\n");
+ retval = -ENOMEM;
+ goto out;
+ }
+
+ retval = pm->pmf->set_grace_period(pm, buffer,
+ grace_period, xcc_id);
+ if (!retval)
+ kq_submit_packet(pm->priv_queue);
+ else
+ kq_rollback_packet(pm->priv_queue);

In the event of partial success do we need to roll back (i.e. resubmit default 
grace period) on failure?
The function pm_set_grace_period_v9 always return 0, and it is not 
complicate operation, it should be always

[PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

2023-07-06 Thread Eric Huang
each xcc instance needs to get iq wait time and set
grace period accordingly.

Signed-off-by: Eric Huang 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  9 --
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 32 +++
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c|  9 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +-
 5 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a2bff3f01359..0f12c1989e14 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1606,6 +1606,8 @@ static int set_sched_resources(struct 
device_queue_manager *dqm)
 
 static int initialize_cpsch(struct device_queue_manager *dqm)
 {
+   uint32_t xcc_id, xcc_mask = dqm->dev->xcc_mask;
+
pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm));
 
mutex_init(>lock_hidden);
@@ -1620,8 +1622,11 @@ static int initialize_cpsch(struct device_queue_manager 
*dqm)
init_sdma_bitmaps(dqm);
 
if (dqm->dev->kfd2kgd->get_iq_wait_times)
-   dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev,
-   >wait_times, 0);
+   for_each_inst(xcc_id, xcc_mask)
+   dqm->dev->kfd2kgd->get_iq_wait_times(
+   dqm->dev->adev,
+   >wait_times[xcc_id],
+   xcc_id);
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7dd4b177219d..62a6dc8d3032 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -262,7 +262,7 @@ struct device_queue_manager {
/* used for GFX 9.4.3 only */
uint32_tcurrent_logical_xcc_start;
 
-   uint32_twait_times;
+   uint32_twait_times[32];
 
wait_queue_head_t   destroy_wait;
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 401096c103b2..f37ab4b6d88c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -374,27 +374,31 @@ int pm_update_grace_period(struct packet_manager *pm, 
uint32_t grace_period)
 {
int retval = 0;
uint32_t *buffer, size;
+   uint32_t xcc_id, xcc_mask = pm->dqm->dev->xcc_mask;
 
size = pm->pmf->set_grace_period_size;
 
mutex_lock(>lock);
 
if (size) {
-   kq_acquire_packet_buffer(pm->priv_queue,
-   size / sizeof(uint32_t),
-   (unsigned int **));
-
-   if (!buffer) {
-   pr_err("Failed to allocate buffer on kernel queue\n");
-   retval = -ENOMEM;
-   goto out;
-   }
+   for_each_inst(xcc_id, xcc_mask) {
+   kq_acquire_packet_buffer(pm->priv_queue,
+   size / sizeof(uint32_t),
+   (unsigned int **));
 
-   retval = pm->pmf->set_grace_period(pm, buffer, grace_period);
-   if (!retval)
-   kq_submit_packet(pm->priv_queue);
-   else
-   kq_rollback_packet(pm->priv_queue);
+   if (!buffer) {
+   pr_err("Failed to allocate buffer on kernel 
queue\n");
+   retval = -ENOMEM;
+   goto out;
+   }
+
+   retval = pm->pmf->set_grace_period(pm, buffer,
+   grace_period, xcc_id);
+   if (!retval)
+   kq_submit_packet(pm->priv_queue);
+   else
+   kq_rollback_packet(pm->priv_queue);
+   }
}
 
 out:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8fda16e6fee6..a9443d661957 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -287,7 +287,8 @@ static int pm_map_queues_v9(struct packet_manager *pm, 
uint32_t *buffer,
 
 static int pm_set_grace_period_v9(struct packet_manager *pm,
uint32_t *buffer,
-   uint32_t grace_period)
+   uint32_t grace_period,
+   uint32_t inst)
 {
str

[PATCH 3/6] drm/amdkfd: enable watch points globally for gfx943

2023-07-06 Thread Eric Huang
From: Jonathan Kim 

Set watch points for all xcc instances on GFX943.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 24083db44724..190b03efe5ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -446,7 +446,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t *watch_id,
uint32_t watch_mode)
 {
-   int r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   uint32_t xcc_mask = pdd->dev->xcc_mask;
 
if (r)
return r;
@@ -460,14 +461,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
}
 
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch(
+   for_each_inst(xcc_id, xcc_mask)
+   pdd->watch_points[*watch_id] = 
pdd->dev->kfd2kgd->set_address_watch(
pdd->dev->adev,
watch_address,
watch_address_mask,
*watch_id,
watch_mode,
pdd->dev->vm_info.last_vmid_kfd,
-   0);
+   xcc_id);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
-- 
2.34.1



[PATCH 2/6] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-07-06 Thread Eric Huang
From: Jonathan Kim 

The additional information that the KFD reports to the debugger was
destroyed when the following commit was merged:
"drm/amdkfd: convert switches to IP version checking"

Signed-off-by: Jonathan Kim 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Jonathan Kim 
Acked-by: Amber Lin 
Signed-off-by: Eric Huang 
Reviewed-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 --
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  3 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 61fc62f3e003..1a4cdee86759 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct 
kfd_topology_device *dev)
HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
-   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
-   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
+   if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3))
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3;
+   else
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2))
dev->node_props.debug_prop |=
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index cba2cd5ed9d1..dea32a9e5506 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -32,9 +32,12 @@
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32
 
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96
+#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10   7
 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT  \
(29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
+#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \
+   (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
 
 struct kfd_node_properties {
uint64_t hive_id;
-- 
2.34.1



[PATCH 5/6] drm/amdkfd: always keep trap enabled for GC v9.4.3

2023-07-06 Thread Eric Huang
To set TTMP setup on by default.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index cf1db0ab3471..47c5d16677d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2842,7 +2842,7 @@ static int runtime_disable(struct kfd_process *p)
pdd->spi_dbg_override =
pdd->dev->kfd2kgd->disable_debug_trap(
pdd->dev->adev,
-   false,
+   KFD_GC_VERSION(pdd->dev) == 
IP_VERSION(9, 4, 3),
pdd->dev->vm_info.last_vmid_kfd);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 190b03efe5ff..4cb9b3b18065 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -591,7 +591,8 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, 
bool unwind, int unwind
pdd->spi_dbg_override =
pdd->dev->kfd2kgd->disable_debug_trap(
pdd->dev->adev,
-   target->runtime_info.ttmp_setup,
+   KFD_GC_VERSION(pdd->dev) == IP_VERSION(9, 4, 3) 
?
+   true : target->runtime_info.ttmp_setup,
pdd->dev->vm_info.last_vmid_kfd);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index ba04a4baecf2..91ae9121e2bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1644,9 +1644,9 @@ struct kfd_process_device 
*kfd_create_process_device_data(struct kfd_node *dev,
p->pdds[p->n_pdds++] = pdd;
if (kfd_dbg_is_per_vmid_supported(pdd->dev))
pdd->spi_dbg_override = pdd->dev->kfd2kgd->disable_debug_trap(
-   pdd->dev->adev,
-   false,
-   0);
+   pdd->dev->adev,
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3),
+   0);
 
/* Init idr used for memory handle translation */
idr_init(>alloc_idr);
-- 
2.34.1



[PATCH 6/6] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-07-06 Thread Eric Huang
From: Jonathan Kim 

Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended
MAP_PROCESS packet to support multi-process debugging.  Update the
mutli-process debug support list so that the KFD updates the runlist
on debug mode setting and that it allocates enough GTT memory during
KFD device initialization.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index a289e59ceb79..a0afc6a7b6c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
 
 static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev)
 {
-   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
-  KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0);
+   return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) ||
+   KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0));
 }
 
 void debug_event_write_work_handler(struct work_struct *work);
-- 
2.34.1



[PATCH 1/6] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-06 Thread Eric Huang
From: Jonathan Kim 

Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec, i.e. xcc instance.

Signed-off-by: Jonathan Kim 
Signed-off-by: Eric Huang 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  10 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  30 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 152 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|   9 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  10 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   3 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   2 +-
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c|   3 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   9 +-
 12 files changed, 230 insertions(+), 26 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 60f9e027fb66..7d7eaed68531 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,7 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_2_offset.h"
 #include "gc/gc_9_4_2_sh_mask.h"
 #include 
@@ -36,7 +37,7 @@
  * initialize the debug mode registers after it has disabled GFX off during the
  * debug session.
  */
-static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
bool restore_dbg_registers,
uint32_t vmid)
 {
@@ -50,7 +51,7 @@ static uint32_t kgd_aldebaran_enable_debug_trap(struct 
amdgpu_device *adev,
 }
 
 /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
-static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
bool keep_trap_enabled,
uint32_t vmid)
 {
@@ -107,7 +108,7 @@ static uint32_t 
kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device
return data;
 }
 
-static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
uint8_t wave_launch_mode,
uint32_t vmid)
 {
@@ -125,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst )
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
new file mode 100644
index ..ed349ff397bd
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint3

[PATCH 0/6] Upstream debugger feature for GFX v9.4.3

2023-07-06 Thread Eric Huang
Eric Huang (2):
  drm/amdkfd: enable grace period for xcc instance
  drm/amdkfd: always keep trap enabled for GC v9.4.3

Jonathan Kim (4):
  drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3
  drm/amdkfd: restore debugger additional info for gfx v9_4_3
  drm/amdkfd: enable watch points globally for gfx943
  drm/amdkfd: add multi-process debugging support for GC v9.4.3

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  10 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  30 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 152 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|   9 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  10 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|  12 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   5 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   9 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   2 +-
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |  32 ++--
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c|  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   3 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   9 +-
 20 files changed, 284 insertions(+), 57 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

-- 
2.34.1



[PATCH 3/5] drm/amdkfd: add xcc instance for debugger APIs

2023-07-05 Thread Eric Huang
Since GFX9 GPU has multiple xcc instances, this is to
implement this change in KFD for debugger APIs.

Signed-off-by: Eric Huang 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c|  6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c |  6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c  | 12 
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h  | 13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c  |  6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c   | 12 
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h   | 13 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c  |  6 --
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c  |  3 ++-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 12 
 11 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index f3f7e0437447..c7f88bfa1976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -126,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst )
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
@@ -163,7 +164,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
 }
 
 static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device 
*adev,
- uint32_t watch_id)
+ uint32_t watch_id,
+ uint32_t inst)
 {
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index 3299e268f234..c0546db91579 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -454,7 +454,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch(
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst)
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
@@ -491,7 +492,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch(
 }
 
 static uint32_t kgd_gfx_v9_4_3_clear_address_watch(struct amdgpu_device *adev,
-   uint32_t watch_id)
+   uint32_t watch_id,
+   uint32_t inst)
 {
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 8ad7a7779e14..04daa8f9456b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -886,7 +886,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device 
*adev,
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst)
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
@@ -942,7 +943,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device 
*adev,
 }
 
 uint32_t kgd_gfx_v10_clear_address_watch(struct amdgpu_device *adev,
-   uint32_t watch_id)
+   uint32_t watch_id,
+   uint32_t inst)
 {
uint32_t watch_address_cntl;
 
@@ -968,7 +970,8 @@ uint32_t kgd_gfx_v10_clear_address_watch(struct 
amdgpu_device *adev,
  * deq_retry_wait_time  -- Wait Count for Global Wave Syncs.
  */
 void kgd_gfx_v10_get_iq_wait_times(struct amdgpu_device *adev,
-   uint32_t *wait_times)
+   uint32_t *wait_times,
+   uint32_t inst)
 
 {
*wait_times = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_IQ_WAIT_TIME2));
@@ -978,7 +981,8 @@ void kgd_gfx_v10_build_grace_period_packet_info(struct 
amdgpu_device *adev

[PATCH 0/5] Upstream debugger feature for GFX v9.4.3

2023-07-05 Thread Eric Huang
Eric Huang (1):
  drm/amdkfd: add xcc instance for debugger APIs

Jonathan Kim (4):
  drm/amdgpu: add kfd2kgd debugger callbacks for GC v9.4.3
  drm/amdkfd: restore debugger additional info for gfx v9_4_3
  drm/amdkfd: enable watch points globally for gfx943
  drm/amdkfd: add multi-process debugging support for GC v9.4.3

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  13 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  30 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 153 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  12 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|  13 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|   6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  12 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  13 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|  18 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   5 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   2 +-
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c|   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   3 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  12 +-
 15 files changed, 265 insertions(+), 40 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

-- 
2.34.1



[PATCH 5/5] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-07-05 Thread Eric Huang
From: Jonathan Kim 

Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended
MAP_PROCESS packet to support multi-process debugging.  Update the
mutli-process debug support list so that the KFD updates the runlist
on debug mode setting and that it allocates enough GTT memory during
KFD device initialization.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index a289e59ceb79..a0afc6a7b6c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
 
 static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev)
 {
-   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
-  KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0);
+   return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) ||
+   KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0));
 }
 
 void debug_event_write_work_handler(struct work_struct *work);
-- 
2.34.1



[PATCH 4/5] drm/amdkfd: enable watch points globally for gfx943

2023-07-05 Thread Eric Huang
From: Jonathan Kim 

Set watch points for all xcc instances on GFX943.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c  |  6 --
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 16 ++--
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index c0546db91579..d9357a61bf31 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -480,11 +480,13 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch(
VALID,
1);
 
-   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) +
+   WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst),
+   regTCP_WATCH0_ADDR_H) +
(watch_id * TCP_WATCH_STRIDE)),
watch_address_high);
 
-   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_L) +
+   WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst),
+   regTCP_WATCH0_ADDR_L) +
(watch_id * TCP_WATCH_STRIDE)),
watch_address_low);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index dcc49183364b..b4ec809c8892 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -413,7 +413,8 @@ static bool kfd_dbg_owns_dev_watch_id(struct 
kfd_process_device *pdd, int watch_
 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd,
uint32_t watch_id)
 {
-   int r;
+   int xcc_id, r;
+   uint32_t xcc_mask = pdd->dev->xcc_mask;
 
if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))
return -EINVAL;
@@ -425,10 +426,11 @@ int kfd_dbg_trap_clear_dev_address_watch(struct 
kfd_process_device *pdd,
}
 
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch(
+   for_each_inst(xcc_id, xcc_mask)
+   pdd->watch_points[watch_id] = 
pdd->dev->kfd2kgd->clear_address_watch(
pdd->dev->adev,
watch_id,
-   0);
+   xcc_id);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
@@ -447,7 +449,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t *watch_id,
uint32_t watch_mode)
 {
-   int r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   uint32_t xcc_mask = pdd->dev->xcc_mask;
 
if (r)
return r;
@@ -461,14 +464,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
}
 
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch(
+   for_each_inst(xcc_id, xcc_mask)
+   pdd->watch_points[*watch_id] = 
pdd->dev->kfd2kgd->set_address_watch(
pdd->dev->adev,
watch_address,
watch_address_mask,
*watch_id,
watch_mode,
pdd->dev->vm_info.last_vmid_kfd,
-   0);
+   xcc_id);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
-- 
2.34.1



[PATCH 2/5] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-07-05 Thread Eric Huang
From: Jonathan Kim 

The additional information that the KFD reports to the debugger was
destroyed when the following commit was merged:
"drm/amdkfd: convert switches to IP version checking"

Signed-off-by: Jonathan Kim 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Jonathan Kim 
Acked-by: Amber Lin 
Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 --
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  3 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 61fc62f3e003..1a4cdee86759 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct 
kfd_topology_device *dev)
HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
-   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
-   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
+   if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3))
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3;
+   else
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2))
dev->node_props.debug_prop |=
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index cba2cd5ed9d1..dea32a9e5506 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -32,9 +32,12 @@
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32
 
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96
+#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10   7
 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT  \
(29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
+#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \
+   (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
 
 struct kfd_node_properties {
uint64_t hive_id;
-- 
2.34.1



[PATCH 1/5] drm/amdgpu: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-05 Thread Eric Huang
From: Jonathan Kim 

Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec.

Signed-off-by: Jonathan Kim 
Signed-off-by: Eric Huang 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   7 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  30 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 149 +-
 3 files changed, 182 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 60f9e027fb66..f3f7e0437447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,7 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_2_offset.h"
 #include "gc/gc_9_4_2_sh_mask.h"
 #include 
@@ -36,7 +37,7 @@
  * initialize the debug mode registers after it has disabled GFX off during the
  * debug session.
  */
-static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
bool restore_dbg_registers,
uint32_t vmid)
 {
@@ -50,7 +51,7 @@ static uint32_t kgd_aldebaran_enable_debug_trap(struct 
amdgpu_device *adev,
 }
 
 /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
-static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
bool keep_trap_enabled,
uint32_t vmid)
 {
@@ -107,7 +108,7 @@ static uint32_t 
kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device
return data;
 }
 
-static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
uint8_t wave_launch_mode,
uint32_t vmid)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
new file mode 100644
index ..ed349ff397bd
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid);
+uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid);
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+   uint8_t wave_launch_mode,
+   uint32_t vmid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index 5b4b7f8b92a5..3299e268f234 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -22,6 +22,7 @@
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_3_offset.h"
 #include "gc/gc_9_4_3_sh_mask.h"
 #include "athub/athub_1_8_0_offset.h"
@@ -32,6 +33,7 @@
 #include "soc15.h"
 #include "sdma/s

[PATCH 5/5] drm/amdkfd: enable watch points globally for gfx943

2023-06-28 Thread Eric Huang
From: Jonathan Kim 

Set watch points for all xcc instances on GFX943.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c  |  6 --
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 16 ++--
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index 17fe4e90f203..9c32b9fbd866 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -480,11 +480,13 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch(
VALID,
1);
 
-   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) +
+   WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst),
+   regTCP_WATCH0_ADDR_H) +
(watch_id * TCP_WATCH_STRIDE)),
watch_address_high);
 
-   WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_L) +
+   WREG32_RLC((SOC15_REG_OFFSET(GC, GET_INST(GC, inst),
+   regTCP_WATCH0_ADDR_L) +
(watch_id * TCP_WATCH_STRIDE)),
watch_address_low);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index dcc49183364b..b4ec809c8892 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -413,7 +413,8 @@ static bool kfd_dbg_owns_dev_watch_id(struct 
kfd_process_device *pdd, int watch_
 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd,
uint32_t watch_id)
 {
-   int r;
+   int xcc_id, r;
+   uint32_t xcc_mask = pdd->dev->xcc_mask;
 
if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))
return -EINVAL;
@@ -425,10 +426,11 @@ int kfd_dbg_trap_clear_dev_address_watch(struct 
kfd_process_device *pdd,
}
 
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch(
+   for_each_inst(xcc_id, xcc_mask)
+   pdd->watch_points[watch_id] = 
pdd->dev->kfd2kgd->clear_address_watch(
pdd->dev->adev,
watch_id,
-   0);
+   xcc_id);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
@@ -447,7 +449,8 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
uint32_t *watch_id,
uint32_t watch_mode)
 {
-   int r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   int xcc_id, r = kfd_dbg_get_dev_watch_id(pdd, watch_id);
+   uint32_t xcc_mask = pdd->dev->xcc_mask;
 
if (r)
return r;
@@ -461,14 +464,15 @@ int kfd_dbg_trap_set_dev_address_watch(struct 
kfd_process_device *pdd,
}
 
amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->watch_points[*watch_id] = pdd->dev->kfd2kgd->set_address_watch(
+   for_each_inst(xcc_id, xcc_mask)
+   pdd->watch_points[*watch_id] = 
pdd->dev->kfd2kgd->set_address_watch(
pdd->dev->adev,
watch_address,
watch_address_mask,
*watch_id,
watch_mode,
pdd->dev->vm_info.last_vmid_kfd,
-   0);
+   xcc_id);
amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
-- 
2.34.1



[PATCH 1/5] drm/amdgpu: add debugger support for GC v9.4.3

2023-06-28 Thread Eric Huang
From: Jonathan Kim 

Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec.

Signed-off-by: Jonathan Kim 
Signed-off-by: Eric Huang 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   7 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  30 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 146 +-
 3 files changed, 179 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index 60f9e027fb66..f3f7e0437447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -23,6 +23,7 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_arcturus.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_2_offset.h"
 #include "gc/gc_9_4_2_sh_mask.h"
 #include 
@@ -36,7 +37,7 @@
  * initialize the debug mode registers after it has disabled GFX off during the
  * debug session.
  */
-static uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
bool restore_dbg_registers,
uint32_t vmid)
 {
@@ -50,7 +51,7 @@ static uint32_t kgd_aldebaran_enable_debug_trap(struct 
amdgpu_device *adev,
 }
 
 /* returns TRAP_EN, EXCP_EN and EXCP_REPLACE. */
-static uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
bool keep_trap_enabled,
uint32_t vmid)
 {
@@ -107,7 +108,7 @@ static uint32_t 
kgd_aldebaran_set_wave_launch_trap_override(struct amdgpu_device
return data;
 }
 
-static uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
uint8_t wave_launch_mode,
uint32_t vmid)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
new file mode 100644
index ..5f776ede295e
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+uint32_t kgd_aldebaran_enable_debug_trap(struct amdgpu_device *adev,
+   bool restore_dbg_registers,
+   uint32_t vmid);
+uint32_t kgd_aldebaran_disable_debug_trap(struct amdgpu_device *adev,
+   bool keep_trap_enabled,
+   uint32_t vmid);
+uint32_t kgd_aldebaran_set_wave_launch_mode(struct amdgpu_device *adev,
+   uint8_t wave_launch_mode,
+   uint32_t vmid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index 5b4b7f8b92a5..7aab8dcf46e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -22,6 +22,7 @@
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_amdkfd_gfx_v9.h"
+#include "amdgpu_amdkfd_aldebaran.h"
 #include "gc/gc_9_4_3_offset.h"
 #include "gc/gc_9_4_3_sh_mask.h"
 #include "athub/athub_1_8_0_offset.h"
@@ -32,6 +33,7 @@
 #include "soc15.h"
 #include "sdma/s

[PATCH 3/5] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-06-28 Thread Eric Huang
From: Jonathan Kim 

The additional information that the KFD reports to the debugger was
destroyed when the following commit was merged:
"drm/amdkfd: convert switches to IP version checking"

Signed-off-by: Jonathan Kim 
Reviewed-by: Harish Kasiviswanathan 
Signed-off-by: Jonathan Kim 
Acked-by: Amber Lin 
Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 --
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  3 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 61fc62f3e003..1a4cdee86759 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1932,8 +1932,14 @@ static void kfd_topology_set_capabilities(struct 
kfd_topology_device *dev)
HSA_CAP_TRAP_DEBUG_WAVE_LAUNCH_MODE_SUPPORTED;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
-   dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
-   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
+   if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3))
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3;
+   else
+   dev->node_props.debug_prop |=
+   HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 |
+   HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
 
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2))
dev->node_props.debug_prop |=
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index cba2cd5ed9d1..dea32a9e5506 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -32,9 +32,12 @@
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 32
 
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX96
+#define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9_4_3 7
 #define HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10   7
 #define HSA_DBG_WATCH_ADDR_MASK_HI_BIT  \
(29 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
+#define HSA_DBG_WATCH_ADDR_MASK_HI_BIT_GFX9_4_3 \
+   (30 << HSA_DBG_WATCH_ADDR_MASK_HI_BIT_SHIFT)
 
 struct kfd_node_properties {
uint64_t hive_id;
-- 
2.34.1



[PATCH 2/5] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-06-28 Thread Eric Huang
From: Jonathan Kim 

Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended
MAP_PROCESS packet to support multi-process debugging.  Update the
mutli-process debug support list so that the KFD updates the runlist
on debug mode setting and that it allocates enough GTT memory during
KFD device initialization.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
index a289e59ceb79..a0afc6a7b6c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
@@ -76,8 +76,9 @@ int kfd_dbg_send_exception_to_runtime(struct kfd_process *p,
 
 static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev)
 {
-   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
-  KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0);
+   return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) ||
+   KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0));
 }
 
 void debug_event_write_work_handler(struct work_struct *work);
-- 
2.34.1



[PATCH 4/5] drm/amdkfd: add xcc instance for debugger APIs

2023-06-28 Thread Eric Huang
Since GFX9 GPU has multiple xcc instances, this is to
implement this change in KFD for debugger APIs.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c  | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c   | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h   | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c   | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c| 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h| 6 --
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 6 --
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h  | 6 --
 9 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index f3f7e0437447..c7f88bfa1976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -126,7 +126,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst )
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
@@ -163,7 +164,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
 }
 
 static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device 
*adev,
- uint32_t watch_id)
+ uint32_t watch_id,
+ uint32_t inst)
 {
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
index 7aab8dcf46e1..17fe4e90f203 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c
@@ -454,7 +454,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch(
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst)
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
@@ -491,7 +492,8 @@ static uint32_t kgd_gfx_v9_4_3_set_address_watch(
 }
 
 static uint32_t kgd_gfx_v9_4_3_clear_address_watch(struct amdgpu_device *adev,
-   uint32_t watch_id)
+   uint32_t watch_id,
+   uint32_t inst)
 {
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 8ad7a7779e14..225b8929a878 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -886,7 +886,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device 
*adev,
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid)
+   uint32_t debug_vmid,
+   uint32_t inst)
 {
uint32_t watch_address_high;
uint32_t watch_address_low;
@@ -942,7 +943,8 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device 
*adev,
 }
 
 uint32_t kgd_gfx_v10_clear_address_watch(struct amdgpu_device *adev,
-   uint32_t watch_id)
+   uint32_t watch_id,
+   uint32_t inst)
 {
uint32_t watch_address_cntl;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h
index e6b70196071a..c904a08b022b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h
@@ -44,9 +44,11 @@ uint32_t kgd_gfx_v10_set_address_watch(struct amdgpu_device 
*adev,
uint32_t watch_address_mask,
uint32_t watch_id,
uint32_t watch_mode,
-   uint32_t debug_vmid);
+   uint32_t debug_vmid,
+   uint32_t inst);
 uint32_t kgd_gfx_v10_clear_address_watch(struct

[PATCH 0/5] Upstream debugger feature for GFX v9.4.3

2023-06-28 Thread Eric Huang
Eric Huang (1):
  drm/amdkfd: add xcc instance for debugger APIs

Jonathan Kim (4):
  drm/amdgpu: add debugger support for GC v9.4.3
  drm/amdkfd: add multi-process debugging support for GC v9.4.3
  drm/amdkfd: restore debugger additional info for gfx v9_4_3
  drm/amdkfd: enable watch points globally for gfx943

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |  13 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  30 
 .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 150 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|   6 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h|   6 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c|   6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|  18 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   3 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   6 +-
 13 files changed, 237 insertions(+), 28 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h

-- 
2.34.1



Re: [PATCH] drm/amdkfd: Don't trigger evictions unmapping dmabuf attachments

2023-05-01 Thread Eric Huang

Reviewed-by: Eric Huang 

Regards,
Eric

On 2023-05-01 16:52, Felix Kuehling wrote:

Don't move DMABuf attachments for PCIe P2P mappings to the SYSTEM domain
when unmapping. This avoids triggering eviction fences unnecessarily.
Instead do the move to SYSTEM and back to GTT when mapping these
attachments to ensure the SG table gets updated after evictions.

This may still trigger unnecessary evictions if user mode unmaps and
remaps the same BO. However, this is unlikely in real applications.

Cc: Eric Huang 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 1002c7834386..bb8e6f6793c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -530,6 +530,12 @@ kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment 
*attachment)
  {
struct ttm_operation_ctx ctx = {.interruptible = true};
struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+   int ret;
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   ret = ttm_bo_validate(>tbo, >placement, );
+   if (ret)
+   return ret;
  
  	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);

return ttm_bo_validate(>tbo, >placement, );
@@ -662,11 +668,10 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
  static void
  kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
  {
-   struct ttm_operation_ctx ctx = {.interruptible = true};
-   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
-
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
-   ttm_bo_validate(>tbo, >placement, );
+   /* This is a no-op. We don't want to trigger eviction fences when
+* unmapping DMABufs. Therefore the invalidation (moving to system
+* domain) is done in kfd_mem_dmamap_dmabuf.
+*/
  }
  
  /**




Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-28 Thread Eric Huang




On 2023-04-28 15:42, Felix Kuehling wrote:

On 2023-04-28 14:09, Eric Huang wrote:


On 2023-04-28 12:41, Felix Kuehling wrote:

On 2023-04-28 10:17, Eric Huang wrote:


On 2023-04-27 23:46, Kuehling, Felix wrote:

[AMD Official Use Only - General]

Re-mapping typically happens after evictions, before a new 
eviction fence gets attached. At that time the old eviction fence 
should be in the signaled state already, so it can't be signaled 
again. Therefore I would expect my patch to help with unmapping 
the DMABuf import, without breaking the eviction case.


Are you talking about remapping with a map-to-gpu call from user 
mode? I think that would only be a problem if the KFD BO was 
unmapped and remapped multiple times. The first time it's mapped, 
the fresh dmabuf import should be in the SYSTEM domain, so the 
validation in the SYSTEM domain before GTT would be a no-op.
Yes. The case scenario I am talking about is from user mode, 
mapping->unmapping->re-mapping to the KFD GTT BO will trigger the 
eviction.


I sort of agree that we don't really rely on the eviction fence on 
the DMABuf import. The reservation object is shared with the 
original BO. Moving the original BO triggers the eviction fence, 
so we don't need to trigger it again on the dmabuf import. Other 
than moving the original BO, I don't think we can do anything to 
the DMABuf import that would require an eviction for KFD use case. 
It is a special use case because we control both the import and 
the export in the same context.
I am thinking about no adding KFD eviction fence in first place of 
mapping original GTT BO, because I don't see it can be evicted in 
any cases.


That's not an option. We're not adding an eviction fence. The 
reservation object with the eviction fence is shared between the 
exported BO and the imported one. That's just how DMABuf works. If 
you wait for the fences on the imported BO, you are effectively 
waiting for the fences on the exported BOs. And you can't remove the 
eviction fence from the exported BO.


What if the exported BO will be never evicted in reality? I 
understand how DMABuf works, and imported BO doesn't have eviction 
fence, it shares with exported BO's one if eviction happens, but I 
don't see the exported BO can be evicted.


The exported BO can be evicted like any other BO. For example 
KFDEvictTest is there to cause and test evictions of KFD VRAM BOs. 
Exporting the BO does not pin it (if DMABUF_MOVE_NOTIFIER is enabled, 
which it in the upstream kernel), so the exported BO can still be 
evicted.


Yes. KFD VRAM BO can be evicted, but DMABuf 's original exported BO is 
non-paged/GTT BO. Can GTT BO be evicted? It should be like paged/userptr 
that doesn't have KFD eviction fence.


Regards,
Eric



Regards,
  Felix




Regards,
Eric



Regards,
  Felix


In theory GTT BO is mapped by user calling mmap() in system memory 
like userptr, unlike VRAM it will be not evicted by amdgpu vram 
manager. The only thing is CPU invalidation, but GTT BO doesn't 
register mmu notifier, that will be a potential problem when 
switching paged/userptr to non-paged/GTT for mes scheduler.


Regards,
Eric


In the general case dmabuf imports need their eviction fences. For 
example when we're importing a DMABuf from somewhere else, so the 
eviction fence is not shared with a BO that we already control. 
Even then, unmapping a dmabuf from our KFD VM does not need to 
wait for any fences on the DMABuf.


Regards,
   Felix

-Original Message-
From: Huang, JinHuiEric 
Sent: Thursday, April 27, 2023 14:58
To: Kuehling, Felix ; Koenig, Christian 
; Christian König 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences 
invalidating preemptible DMABuf imports


Hi Felix,

I tested your patch on mGPU systems. It doesn't break any KFD 
eviction tests, because tests don't allocate DMABuf import, that 
doesn't trigger it's eviction fence. The only thing the patch 
affects is in re-mapping DMABuf imports that the eviction will 
still be triggered.


I have an idea that we probably can remove eviction fence for GTT 
bo, because currently the only way to trigger the eviction fence 
is by calling ttm_bo_validate for CPU domain in 
kfd_mem_dmaunmap_dmabuf. Do you know there is other case to 
trigger GTT bo's eviction?


Regards,
Eric

On 2023-04-26 22:21, Felix Kuehling wrote:

Hi Eric,

Can you try if the attached patch fixes the problem without breaking
the eviction tests on a multi-GPU PCIe P2P system?

Thanks,
   Felix


On 2023-04-26 13:02, Christian König wrote:

Am 26.04.23 um 18:58 schrieb Felix Kuehling:

On 2023-04-26 9:03, Christian König wrote:

Am 25.04.23 um 16:11 schrieb Eric Huang:

Hi Christian,

What do you think about Felix's explanation?

That's unfortunately not something we can do here.


Regards,
Eric

On 2023-04-13 09:28, Felix Kuehling wrote:

Am 2023-04-13 um 07:35 schrieb Christian König:

Am 13.04.23 um 03:01 schrieb Felix Kuehling:

Am 2023-04

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-28 Thread Eric Huang



On 2023-04-28 12:41, Felix Kuehling wrote:

On 2023-04-28 10:17, Eric Huang wrote:


On 2023-04-27 23:46, Kuehling, Felix wrote:

[AMD Official Use Only - General]

Re-mapping typically happens after evictions, before a new eviction 
fence gets attached. At that time the old eviction fence should be 
in the signaled state already, so it can't be signaled again. 
Therefore I would expect my patch to help with unmapping the DMABuf 
import, without breaking the eviction case.


Are you talking about remapping with a map-to-gpu call from user 
mode? I think that would only be a problem if the KFD BO was 
unmapped and remapped multiple times. The first time it's mapped, 
the fresh dmabuf import should be in the SYSTEM domain, so the 
validation in the SYSTEM domain before GTT would be a no-op.
Yes. The case scenario I am talking about is from user mode, 
mapping->unmapping->re-mapping to the KFD GTT BO will trigger the 
eviction.


I sort of agree that we don't really rely on the eviction fence on 
the DMABuf import. The reservation object is shared with the 
original BO. Moving the original BO triggers the eviction fence, so 
we don't need to trigger it again on the dmabuf import. Other than 
moving the original BO, I don't think we can do anything to the 
DMABuf import that would require an eviction for KFD use case. It is 
a special use case because we control both the import and the export 
in the same context.
I am thinking about no adding KFD eviction fence in first place of 
mapping original GTT BO, because I don't see it can be evicted in any 
cases.


That's not an option. We're not adding an eviction fence. The 
reservation object with the eviction fence is shared between the 
exported BO and the imported one. That's just how DMABuf works. If you 
wait for the fences on the imported BO, you are effectively waiting 
for the fences on the exported BOs. And you can't remove the eviction 
fence from the exported BO.


What if the exported BO will be never evicted in reality? I understand 
how DMABuf works, and imported BO doesn't have eviction fence, it shares 
with exported BO's one if eviction happens, but I don't see the exported 
BO can be evicted.


Regards,
Eric



Regards,
  Felix


In theory GTT BO is mapped by user calling mmap() in system memory 
like userptr, unlike VRAM it will be not evicted by amdgpu vram 
manager. The only thing is CPU invalidation, but GTT BO doesn't 
register mmu notifier, that will be a potential problem when 
switching paged/userptr to non-paged/GTT for mes scheduler.


Regards,
Eric


In the general case dmabuf imports need their eviction fences. For 
example when we're importing a DMABuf from somewhere else, so the 
eviction fence is not shared with a BO that we already control. Even 
then, unmapping a dmabuf from our KFD VM does not need to wait for 
any fences on the DMABuf.


Regards,
   Felix

-Original Message-
From: Huang, JinHuiEric 
Sent: Thursday, April 27, 2023 14:58
To: Kuehling, Felix ; Koenig, Christian 
; Christian König 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences 
invalidating preemptible DMABuf imports


Hi Felix,

I tested your patch on mGPU systems. It doesn't break any KFD 
eviction tests, because tests don't allocate DMABuf import, that 
doesn't trigger it's eviction fence. The only thing the patch 
affects is in re-mapping DMABuf imports that the eviction will still 
be triggered.


I have an idea that we probably can remove eviction fence for GTT 
bo, because currently the only way to trigger the eviction fence is 
by calling ttm_bo_validate for CPU domain in 
kfd_mem_dmaunmap_dmabuf. Do you know there is other case to trigger 
GTT bo's eviction?


Regards,
Eric

On 2023-04-26 22:21, Felix Kuehling wrote:

Hi Eric,

Can you try if the attached patch fixes the problem without breaking
the eviction tests on a multi-GPU PCIe P2P system?

Thanks,
   Felix


On 2023-04-26 13:02, Christian König wrote:

Am 26.04.23 um 18:58 schrieb Felix Kuehling:

On 2023-04-26 9:03, Christian König wrote:

Am 25.04.23 um 16:11 schrieb Eric Huang:

Hi Christian,

What do you think about Felix's explanation?

That's unfortunately not something we can do here.


Regards,
Eric

On 2023-04-13 09:28, Felix Kuehling wrote:

Am 2023-04-13 um 07:35 schrieb Christian König:

Am 13.04.23 um 03:01 schrieb Felix Kuehling:

Am 2023-04-12 um 18:25 schrieb Eric Huang:

It is to avoid redundant eviction for KFD's DMAbuf import bo
when dmaunmapping DMAbuf. The DMAbuf import bo has been set as
AMDGPU_PL_PREEMPT in KFD when mapping.

Signed-off-by: Eric Huang 

Reviewed-by: Felix Kuehling 

I'd like to get an Acked-by from Christian as well before
submitting this.

I have to admit that I only partially followed the internal
discussion, but in general you need a *really* good explanation
for this.

E.g. add code comment and explain in the commit message
extensively why this is needed and why there are no 
alternative

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-28 Thread Eric Huang



On 2023-04-27 23:46, Kuehling, Felix wrote:

[AMD Official Use Only - General]

Re-mapping typically happens after evictions, before a new eviction fence gets 
attached. At that time the old eviction fence should be in the signaled state 
already, so it can't be signaled again. Therefore I would expect my patch to 
help with unmapping the DMABuf import, without breaking the eviction case.

Are you talking about remapping with a map-to-gpu call from user mode? I think 
that would only be a problem if the KFD BO was unmapped and remapped multiple 
times. The first time it's mapped, the fresh dmabuf import should be in the 
SYSTEM domain, so the validation in the SYSTEM domain before GTT would be a 
no-op.
Yes. The case scenario I am talking about is from user mode, 
mapping->unmapping->re-mapping to the KFD GTT BO will trigger the eviction.


I sort of agree that we don't really rely on the eviction fence on the DMABuf 
import. The reservation object is shared with the original BO. Moving the 
original BO triggers the eviction fence, so we don't need to trigger it again 
on the dmabuf import. Other than moving the original BO, I don't think we can 
do anything to the DMABuf import that would require an eviction for KFD use 
case. It is a special use case because we control both the import and the 
export in the same context.
I am thinking about no adding KFD eviction fence in first place of 
mapping original GTT BO, because I don't see it can be evicted in any 
cases. In theory GTT BO is mapped by user calling mmap() in system 
memory like userptr, unlike VRAM it will be not evicted by amdgpu vram 
manager. The only thing is CPU invalidation, but GTT BO doesn't register 
mmu notifier, that will be a potential problem when switching 
paged/userptr to non-paged/GTT for mes scheduler.


Regards,
Eric


In the general case dmabuf imports need their eviction fences. For example when 
we're importing a DMABuf from somewhere else, so the eviction fence is not 
shared with a BO that we already control. Even then, unmapping a dmabuf from 
our KFD VM does not need to wait for any fences on the DMABuf.

Regards,
   Felix

-Original Message-
From: Huang, JinHuiEric 
Sent: Thursday, April 27, 2023 14:58
To: Kuehling, Felix ; Koenig, Christian 
; Christian König ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating 
preemptible DMABuf imports

Hi Felix,

I tested your patch on mGPU systems. It doesn't break any KFD eviction tests, 
because tests don't allocate DMABuf import, that doesn't trigger it's eviction 
fence. The only thing the patch affects is in re-mapping DMABuf imports that 
the eviction will still be triggered.

I have an idea that we probably can remove eviction fence for GTT bo, because 
currently the only way to trigger the eviction fence is by calling 
ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do you know there is 
other case to trigger GTT bo's eviction?

Regards,
Eric

On 2023-04-26 22:21, Felix Kuehling wrote:

Hi Eric,

Can you try if the attached patch fixes the problem without breaking
the eviction tests on a multi-GPU PCIe P2P system?

Thanks,
   Felix


On 2023-04-26 13:02, Christian König wrote:

Am 26.04.23 um 18:58 schrieb Felix Kuehling:

On 2023-04-26 9:03, Christian König wrote:

Am 25.04.23 um 16:11 schrieb Eric Huang:

Hi Christian,

What do you think about Felix's explanation?

That's unfortunately not something we can do here.


Regards,
Eric

On 2023-04-13 09:28, Felix Kuehling wrote:

Am 2023-04-13 um 07:35 schrieb Christian König:

Am 13.04.23 um 03:01 schrieb Felix Kuehling:

Am 2023-04-12 um 18:25 schrieb Eric Huang:

It is to avoid redundant eviction for KFD's DMAbuf import bo
when dmaunmapping DMAbuf. The DMAbuf import bo has been set as
AMDGPU_PL_PREEMPT in KFD when mapping.

Signed-off-by: Eric Huang 

Reviewed-by: Felix Kuehling 

I'd like to get an Acked-by from Christian as well before
submitting this.

I have to admit that I only partially followed the internal
discussion, but in general you need a *really* good explanation
for this.

E.g. add code comment and explain in the commit message
extensively why this is needed and why there are no alternatives.

OK. I'll give it a shot:

    This code path is used among other things when invalidating
DMABuf
    imports. These imports share a reservation object with the
exported
    BO. Waiting on all the fences in this reservation will trigger
KFD
    eviction fences unnecessarily, for example when a DMABuf
import for
    a DMA mapping on a secondary GPU is being unmapped explicitly.
Only
    moving the original exported BO requires stopping KFD user
mode
    queues. If the invalidation is triggered through a move
notifier
    from the exported BO, then moving the original BO already
triggered
    the eviction fence and we don't need to wait for it again on
the import.

    We can identify DMABuf imports in KFD for secondary GPU DMA
ma

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-27 Thread Eric Huang

Hi Felix,

I tested your patch on mGPU systems. It doesn't break any KFD eviction 
tests, because tests don't allocate DMABuf import, that doesn't trigger 
it's eviction fence. The only thing the patch affects is in re-mapping 
DMABuf imports that the eviction will still be triggered.


I have an idea that we probably can remove eviction fence for GTT bo, 
because currently the only way to trigger the eviction fence is by 
calling ttm_bo_validate for CPU domain in kfd_mem_dmaunmap_dmabuf. Do 
you know there is other case to trigger GTT bo's eviction?


Regards,
Eric

On 2023-04-26 22:21, Felix Kuehling wrote:

Hi Eric,

Can you try if the attached patch fixes the problem without breaking 
the eviction tests on a multi-GPU PCIe P2P system?


Thanks,
  Felix


On 2023-04-26 13:02, Christian König wrote:

Am 26.04.23 um 18:58 schrieb Felix Kuehling:


On 2023-04-26 9:03, Christian König wrote:

Am 25.04.23 um 16:11 schrieb Eric Huang:

Hi Christian,

What do you think about Felix's explanation?


That's unfortunately not something we can do here.



Regards,
Eric

On 2023-04-13 09:28, Felix Kuehling wrote:

Am 2023-04-13 um 07:35 schrieb Christian König:

Am 13.04.23 um 03:01 schrieb Felix Kuehling:

Am 2023-04-12 um 18:25 schrieb Eric Huang:

It is to avoid redundant eviction for KFD's DMAbuf import
bo when dmaunmapping DMAbuf. The DMAbuf import bo has
been set as AMDGPU_PL_PREEMPT in KFD when mapping.

Signed-off-by: Eric Huang 


Reviewed-by: Felix Kuehling 

I'd like to get an Acked-by from Christian as well before 
submitting this.


I have to admit that I only partially followed the internal 
discussion, but in general you need a *really* good explanation 
for this.


E.g. add code comment and explain in the commit message 
extensively why this is needed and why there are no alternatives.


OK. I'll give it a shot:

   This code path is used among other things when invalidating 
DMABuf
   imports. These imports share a reservation object with the 
exported
   BO. Waiting on all the fences in this reservation will trigger 
KFD
   eviction fences unnecessarily, for example when a DMABuf 
import for
   a DMA mapping on a secondary GPU is being unmapped explicitly. 
Only

   moving the original exported BO requires stopping KFD user mode
   queues. If the invalidation is triggered through a move notifier
   from the exported BO, then moving the original BO already 
triggered
   the eviction fence and we don't need to wait for it again on 
the import.


   We can identify DMABuf imports in KFD for secondary GPU DMA 
mappings

   by the mem_type AMDGPU_PL_PREEMPT. In this case, use a wait
   operation that ignores KFD eviction fences.

How does this sound?


To be honest like quite a bad idea. Why in the world are imported 
BOs moved from GTT to SYSTEM in the first place?


As I understand it, the way to update SG tables in  SG BOs (e.g. 
userptr and dmabuf imports) is to move them back and forth between 
system and GTT domains. If we left the import in the GTT domain all 
the time, we would have no way to update it, e.g. after an eviction. 
Currently the move to the system domain is done in the unmap code path.


Before memory is freed, we also need to unmap it from GPUVM, 
including the DMABuf imports on remote GPUs. For the above reason 
that currently includes moving the import to the system domain. If 
we removed that from the unmap code path, we'd need to do the move 
to system somewhere else, maybe in the mapping/validation path.





The only reason for this I can think of is that the DMA mappings 
become invalid for some reasons and in this case waiting for the 
KFD fence is actually the absolutely right thing to do.


In this case the reason the only reason for unmapping the memory is 
that we're about to free the memory and its DMABuf imports on other 
GPUs. This is coming from the application with a promise "I'm no 
longer accessing the memory". We don't need to wait for fences here. 
We only need to invalidate the PTEs to make sure that any further 
buggy access by the application will fault.


Well in this case just free the BO and it's bo_va structure. The core 
handling should take care of clearing all the freed up regions.


As for updating the SG of a BO you indeed need to move it from GTT to 
SYSTEM and back, but in this case we should either indeed wait for 
the KFD fence since page tables in between the operation still have 
the old entries or we should destroy the BO and create a new one. The 
later would overwrite the PTEs with invalid entries first and then 
fill in new valid ones.


Regards,
Christian.



Regards,
  Felix




Regards,
Christian.



Regards,
  Felix




Regards,
Christian.



Thanks,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 2430f3e9f3a7..64795fe9eecb 100644
--- a/drivers/gpu/drm/

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-25 Thread Eric Huang

Hi Christian,

What do you think about Felix's explanation?

Regards,
Eric

On 2023-04-13 09:28, Felix Kuehling wrote:

Am 2023-04-13 um 07:35 schrieb Christian König:

Am 13.04.23 um 03:01 schrieb Felix Kuehling:

Am 2023-04-12 um 18:25 schrieb Eric Huang:

It is to avoid redundant eviction for KFD's DMAbuf import
bo when dmaunmapping DMAbuf. The DMAbuf import bo has
been set as AMDGPU_PL_PREEMPT in KFD when mapping.

Signed-off-by: Eric Huang 


Reviewed-by: Felix Kuehling 

I'd like to get an Acked-by from Christian as well before submitting 
this.


I have to admit that I only partially followed the internal 
discussion, but in general you need a *really* good explanation for 
this.


E.g. add code comment and explain in the commit message extensively 
why this is needed and why there are no alternatives.


OK. I'll give it a shot:

   This code path is used among other things when invalidating DMABuf
   imports. These imports share a reservation object with the exported
   BO. Waiting on all the fences in this reservation will trigger KFD
   eviction fences unnecessarily, for example when a DMABuf import for
   a DMA mapping on a secondary GPU is being unmapped explicitly. Only
   moving the original exported BO requires stopping KFD user mode
   queues. If the invalidation is triggered through a move notifier
   from the exported BO, then moving the original BO already triggered
   the eviction fence and we don't need to wait for it again on the 
import.


   We can identify DMABuf imports in KFD for secondary GPU DMA mappings
   by the mem_type AMDGPU_PL_PREEMPT. In this case, use a wait
   operation that ignores KFD eviction fences.

How does this sound?

Regards,
  Felix




Regards,
Christian.



Thanks,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 2430f3e9f3a7..64795fe9eecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -526,7 +526,12 @@ static int amdgpu_bo_move(struct 
ttm_buffer_object *bo, bool evict,

  if ((old_mem->mem_type == TTM_PL_TT ||
   old_mem->mem_type == AMDGPU_PL_PREEMPT) &&
  new_mem->mem_type == TTM_PL_SYSTEM) {
-    r = ttm_bo_wait_ctx(bo, ctx);
+    if (old_mem->mem_type == AMDGPU_PL_PREEMPT)
+    r = amdgpu_bo_sync_wait(abo,
+    AMDGPU_FENCE_OWNER_KFD,
+    ctx->interruptible);
+    else
+    r = ttm_bo_wait_ctx(bo, ctx);
  if (r)
  return r;






[PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-12 Thread Eric Huang
It is to avoid redundant eviction for KFD's DMAbuf import
bo when dmaunmapping DMAbuf. The DMAbuf import bo has
been set as AMDGPU_PL_PREEMPT in KFD when mapping.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2430f3e9f3a7..64795fe9eecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -526,7 +526,12 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
bool evict,
if ((old_mem->mem_type == TTM_PL_TT ||
 old_mem->mem_type == AMDGPU_PL_PREEMPT) &&
new_mem->mem_type == TTM_PL_SYSTEM) {
-   r = ttm_bo_wait_ctx(bo, ctx);
+   if (old_mem->mem_type == AMDGPU_PL_PREEMPT)
+   r = amdgpu_bo_sync_wait(abo,
+   AMDGPU_FENCE_OWNER_KFD,
+   ctx->interruptible);
+   else
+   r = ttm_bo_wait_ctx(bo, ctx);
if (r)
return r;
 
-- 
2.34.1



[PATCH] drm/amdgpu: only wait GTT bo's fence in amdgpu_bo_move

2023-04-12 Thread Eric Huang
It is to avoid redundant eviction for KFD's DMAbuf import
bo when dmaunmapping DMAbuf. The DMAbuf import bo has
been set as AMDGPU_PL_PREEMPT in KFD when mapping.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2430f3e9f3a7..a0828f6d9fbe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -526,7 +526,10 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
bool evict,
if ((old_mem->mem_type == TTM_PL_TT ||
 old_mem->mem_type == AMDGPU_PL_PREEMPT) &&
new_mem->mem_type == TTM_PL_SYSTEM) {
-   r = ttm_bo_wait_ctx(bo, ctx);
+   if (old_mem->mem_type == AMDGPU_PL_PREEMPT)
+   r = amdgpu_bo_sync_wait(abo, AMDGPU_FENCE_OWNER_KFD, 
false);
+   else
+   r = ttm_bo_wait_ctx(bo, ctx);
if (r)
return r;
 
-- 
2.34.1



Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-10 Thread Eric Huang

Hi Felix,

What do you think my proposal in my previous email? that setting domain 
to CPU in kfd_mem_dmamap_dmabuf, and setting domain to GTT in 
kfd_mem_dmaunmap_dmabuf, that will be doing the similar way as userptr.


Thanks,
Eric

On 2023-04-10 14:50, Felix Kuehling wrote:
Sorry, you're right, there is no AMDGPU_GEM_DOMAIN_PREEMPTIBLE. I 
remembered this wrong. There is a flag called 
AMDGPU_GEM_CREATE_PREEMPTIBLE, which changes what happens when is 
placed in the AMDGPU_GEM_DOMAIN_GTT domain.


So my proposal would need to be modified to set the flag 
AMDGPU_GEM_CREATE_PREEMPTIBLE in the imported DMABuf BO.


On 2023-04-10 14:28, Eric Huang wrote:

Hi Felix,

Thanks for your review and suggestion, but unfortunately the 
AMDGPU_GEM_DOMAIN_PREEMPTIBLE is not defined in amdgpu_drm.h. I 
understand we need the memory eviction on either 
kfd_mem_dmamap_dmabuf() or kfd_mem_dmaunmap_dmabuf() to update DMA 
address, so I am thinking to do it as simply as userptr memory does.


The purpose for this change is for non-MES HW scheduler we are using 
userptr/paged memory, but since GFX11 we will be using MES scheduler 
and it needs the memory to be allocated as GTT/non-paged memory, so 
we want all GPUs using GTT/non-paged memory, but there is performance 
drop, because of eviction in kfd_mem_dmaunmap_dmabuf.


Currently userptr memory is evicted in kfd_mem_dmamap_userptr as 
changing domain to GTT before calling ttm_bo_validate, and not 
evicted in kfd_mem_dmamap_userptr, so I think we can do the similar 
way for GTT/non-paged memory that setting domain to CPU in 
kfd_mem_dmamap_dmabuf, which will evict memory to update DMA address, 
and setting domain to GTT in kfd_mem_dmaunmap_dmabuf, which will not 
evict memory. The performance should be the same as userptr/paged 
memory.


This sounds backwards to me. dmaunmap should move objects to the CPU 
domain because the GPU mapping is potentially invalid. And dmamap must 
use move it to the GTT domain because that updates the GPU mapping and 
allows the GPU virtual address mapping to be updated.


The problem is the eviction in dmaunmap. Userptrs don't see these 
evictions because the SG BOs we use to map them on other GPUs do set 
the AMDGPU_GEM_CREATE_PREEMPTIBLE flag. My idea is to do the same 
thing for DMABufs that map GTT (and VRAM) BOs to other GPUs._


Now that I look at it in more detail, I see we're already doing that 
in kfd_mem_attach_dmabuf:


    *bo = gem_to_amdgpu_bo(gobj);
    (*bo)->flags |= AMDGPU_GEM_CREATE_PREEMPTIBLE;

So then the question is, why is this not working? I think that's the 
second part of my proposal, which is still needed:



2. Add a special case in the above if-block for old_mem->mem_type ==
   AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with
   owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction 
fences 


Regards,
  Felix




Regards,
Eric

On 2023-04-04 16:40, Felix Kuehling wrote:

[+Christian]

OK, this comes from the ttm_bo_wait_ctx call in this section of 
amdgpu_bo_move:


    if ((old_mem->mem_type == TTM_PL_TT ||
 old_mem->mem_type == AMDGPU_PL_PREEMPT) &&
 new_mem->mem_type == TTM_PL_SYSTEM) {
    r = ttm_bo_wait_ctx(bo, ctx);
    if (r)
    return r;

    amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm);
    ttm_resource_free(bo, >resource);
    ttm_bo_assign_mem(bo, new_mem);
    goto out;
    }

We can't just remove this wait. It's not even specific to KFD or 
DMABuf imports. We also can't just change it to avoid waiting for 
eviction fences because it's also used for GTT BOs (e.g. before a BO 
gets swapped under extreme memory pressure). So we also need to 
trigger the eviction fence in general case.


In the specific case of DMABuf imports, they share the reservation 
object with the original BO. So waiting on the reservation triggers 
the eviction fence on the original BO. I think we want to avoid the 
waiting on eviction fences for all BOs where the underlying memory 
is managed by some other BO, and at the same time also avoid ever 
evicting the DMABuf import BO. That's what AMDGPU_PL_PREEMPT is for. 
So I think a combination of two changes should to the trick:


1. Change kfd_mem_dmamap_dmabuf to use AMDGPU_GEM_DOMAIN_PREEMPTIBLE
2. Add a special case in the above if-block for old_mem->mem_type ==
   AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with
   owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction 
fences


Regards,
  Felix


Am 2023-04-04 um 10:36 schrieb Eric Huang:

Here is the backtrace from Jira:

Thu Nov 10 13:10:23 2022] Scheduling eviction of pid 97784 in 0 
jiffies
[Thu Nov 10 13:10:23 2022] WARNING: CPU: 173 PID: 97784 at 
/var/lib/dkms/amdgpu/5.16.9.22.20-1438746~20.04/build/amd/amdgpu/../amdkfd/kfd_device.c:878 
kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022

Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-10 Thread Eric Huang

Hi Felix,

Thanks for your review and suggestion, but unfortunately the 
AMDGPU_GEM_DOMAIN_PREEMPTIBLE is not defined in amdgpu_drm.h. I 
understand we need the memory eviction on either kfd_mem_dmamap_dmabuf() 
or kfd_mem_dmaunmap_dmabuf() to update DMA address, so I am thinking to 
do it as simply as userptr memory does.


The purpose for this change is for non-MES HW scheduler we are using 
userptr/paged memory, but since GFX11 we will be using MES scheduler and 
it needs the memory to be allocated as GTT/non-paged memory, so we want 
all GPUs using GTT/non-paged memory, but there is performance drop, 
because of eviction in kfd_mem_dmaunmap_dmabuf.


Currently userptr memory is evicted in kfd_mem_dmamap_userptr as 
changing domain to GTT before calling ttm_bo_validate, and not evicted 
in kfd_mem_dmamap_userptr, so I think we can do the similar way for 
GTT/non-paged memory that setting domain to CPU in 
kfd_mem_dmamap_dmabuf, which will evict memory to update DMA address, 
and setting domain to GTT in kfd_mem_dmaunmap_dmabuf, which will not 
evict memory. The performance should be the same as userptr/paged memory.


Regards,
Eric

On 2023-04-04 16:40, Felix Kuehling wrote:

[+Christian]

OK, this comes from the ttm_bo_wait_ctx call in this section of 
amdgpu_bo_move:


    if ((old_mem->mem_type == TTM_PL_TT ||
 old_mem->mem_type == AMDGPU_PL_PREEMPT) &&
 new_mem->mem_type == TTM_PL_SYSTEM) {
    r = ttm_bo_wait_ctx(bo, ctx);
    if (r)
    return r;

    amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm);
    ttm_resource_free(bo, >resource);
    ttm_bo_assign_mem(bo, new_mem);
    goto out;
    }

We can't just remove this wait. It's not even specific to KFD or 
DMABuf imports. We also can't just change it to avoid waiting for 
eviction fences because it's also used for GTT BOs (e.g. before a BO 
gets swapped under extreme memory pressure). So we also need to 
trigger the eviction fence in general case.


In the specific case of DMABuf imports, they share the reservation 
object with the original BO. So waiting on the reservation triggers 
the eviction fence on the original BO. I think we want to avoid the 
waiting on eviction fences for all BOs where the underlying memory is 
managed by some other BO, and at the same time also avoid ever 
evicting the DMABuf import BO. That's what AMDGPU_PL_PREEMPT is for. 
So I think a combination of two changes should to the trick:


1. Change kfd_mem_dmamap_dmabuf to use AMDGPU_GEM_DOMAIN_PREEMPTIBLE
2. Add a special case in the above if-block for old_mem->mem_type ==
   AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with
   owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction 
fences


Regards,
  Felix


Am 2023-04-04 um 10:36 schrieb Eric Huang:

Here is the backtrace from Jira:

Thu Nov 10 13:10:23 2022] Scheduling eviction of pid 97784 in 0 jiffies
[Thu Nov 10 13:10:23 2022] WARNING: CPU: 173 PID: 97784 at 
/var/lib/dkms/amdgpu/5.16.9.22.20-1438746~20.04/build/amd/amdgpu/../amdkfd/kfd_device.c:878 
kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022] Modules linked in: veth amdgpu(OE) 
amddrm_ttm_helper(OE) amdttm(OE) iommu_v2 amd_sched(OE) amdkcl(OE) 
xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user 
xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc 
aufs overlay binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common amd64_edac 
edac_mce_amd kvm_amd kvm efi_pstore rapl ipmi_ssif ccp acpi_ipmi 
k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel msr 
ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx 
xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs 
ib_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
crypto_simd cryptd ast drm_vram_helper drm_ttm_helper ttm mlx5_core 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[Thu Nov 10 13:10:23 2022]  pci_hyperv_intf cec psample igb mlxfw 
rc_core dca ahci xhci_pci tls drm i2c_algo_bit libahci 
xhci_pci_renesas i2c_piix4
[Thu Nov 10 13:10:23 2022] CPU: 173 PID: 97784 Comm: onnxruntime_tes 
Tainted: G        W  OE     5.13.0-30-generic #33~20.04.1-Ubuntu
[Thu Nov 10 13:10:23 2022] Hardware name: GIGABYTE 
G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020
[Thu Nov 10 13:10:23 2022] RIP: 
0010:kgd2kfd_schedule_evict_and_restore_process+0x104/0x120 [amdgpu]
[Thu Nov 10 13:10:23 2022] Code: 5e 5d c3 4c 89 e7 e8 cb c6 44 df eb 
e7 49 8b 45 60 48 89 ca 48 c7 c7 38 8b d7 c1 48 89 4d e0 8b b0 20 09 
00 00 e8 87 ee 7e df <0f> 0b 48 8b 4d e0 eb 9f 41 be ea ff ff ff eb 
ba 41 be ed ff ff ff

[Thu Nov 10 13:10:23 2022

Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-04 Thread Eric Huang
lt;48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 
64 89 01 48
[Thu Nov 10 13:10:23 2022] RSP: 002b:7fffe41e0098 EFLAGS: 0206 
ORIG_RAX: 0010
[Thu Nov 10 13:10:23 2022] RAX: ffda RBX: 7fcacc7f7f80 
RCX: 7fcaff57b3ab
[Thu Nov 10 13:10:23 2022] RDX: 7fffe41e0120 RSI: c0184b19 
RDI: 0003
[Thu Nov 10 13:10:23 2022] RBP: 7fffe41e00d0 R08: 562e2d5730d0 
R09: 
[Thu Nov 10 13:10:23 2022] R10: 562e2c928ec0 R11: 0206 
R12: 0001
[Thu Nov 10 13:10:23 2022] R13: 7fffe41e04b0 R14:  
R15: 562e2d3f5b20

[Thu Nov 10 13:10:23 2022]  
[Thu Nov 10 13:10:23 2022] ---[ end trace 1464f08f6be60b30 ]---

Regards,
Eric

On 2023-04-04 10:11, Felix Kuehling wrote:
If we keep the BO in the GTT domain, it means it will not be updated 
if we validate it again later in kfd_mem_dmamap_dmabuf. This means 
we'll use stale DMA addresses when we update the page tables after 
evictions.


I think we'll need to find a different way to avoid triggering the 
eviction fence on the original BO when changing the placement of the 
DMABuf import here. If you need help brainstorming here, please share 
a backtrace from the eviction generated with the debug_evictions 
module param.


Regards,
  Felix


Am 2023-04-03 um 13:59 schrieb Eric Huang:

dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf
changing placement to CPU will trigger memory eviction after calling
ttm_bo_validate, and the eviction will cause performance drop.
Keeping the correct domain will solve the issue.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index a3b09edfd1bf..17b708acb447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment 
*attachment)

  struct ttm_operation_ctx ctx = {.interruptible = true};
  struct amdgpu_bo *bo = attachment->bo_va->base.bo;
  -    amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+    amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
  ttm_bo_validate(>tbo, >placement, );
  }




[PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-03 Thread Eric Huang
dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf
changing placement to CPU will trigger memory eviction after calling
ttm_bo_validate, and the eviction will cause performance drop.
Keeping the correct domain will solve the issue.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a3b09edfd1bf..17b708acb447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment 
*attachment)
struct ttm_operation_ctx ctx = {.interruptible = true};
struct amdgpu_bo *bo = attachment->bo_va->base.bo;
 
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
ttm_bo_validate(>tbo, >placement, );
 }
 
-- 
2.34.1



Re: [PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU

2023-01-10 Thread Eric Huang

Ping.

On 2023-01-05 14:28, Eric Huang wrote:

The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 9885735f1a30..d4c29e9edf34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2179,7 +2179,7 @@ int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device 
*adev, struct amdgpu_b
}
  
  	amdgpu_amdkfd_remove_eviction_fence(

-   bo, bo->kfd_bo->process_info->eviction_fence);
+   bo, bo->vm_bo->vm->process_info->eviction_fence);
  
  	amdgpu_bo_unreserve(bo);
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 6013f498ea1e..55c2dc48e567 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -231,7 +231,7 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
  
  	if (q->wptr_bo) {

-   wptr_addr_off = (uint64_t)q->properties.write_ptr - 
(uint64_t)q->wptr_bo->kfd_bo->va;
+   wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE 
- 1);
queue_input.wptr_mc_addr = ((uint64_t)q->wptr_bo->tbo.resource->start 
<< PAGE_SHIFT) + wptr_addr_off;
}
  




[PATCH] drm/amdkfd: Add sync after creating vram bo

2023-01-09 Thread Eric Huang
There will be data corruption on vram allocated by svm
if initialization is not being done. Adding sync is to
resolve this issue.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index b8c9753a4818..344e20306635 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -574,6 +574,13 @@ svm_range_vram_node_new(struct amdgpu_device *adev, struct 
svm_range *prange,
goto reserve_bo_failed;
}
 
+   r = amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
+   if (r) {
+   pr_debug("failed %d to sync bo\n", r);
+   amdgpu_bo_unreserve(bo);
+   goto reserve_bo_failed;
+   }
+
r = dma_resv_reserve_fences(amdkcl_ttm_resvp(>tbo), 1);
if (r) {
pr_debug("failed %d to reserve bo\n", r);
-- 
2.34.1



[PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU

2023-01-05 Thread Eric Huang
The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 9885735f1a30..d4c29e9edf34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2179,7 +2179,7 @@ int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device 
*adev, struct amdgpu_b
}
 
amdgpu_amdkfd_remove_eviction_fence(
-   bo, bo->kfd_bo->process_info->eviction_fence);
+   bo, bo->vm_bo->vm->process_info->eviction_fence);
 
amdgpu_bo_unreserve(bo);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 6013f498ea1e..55c2dc48e567 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -231,7 +231,7 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
queue_input.wptr_addr = (uint64_t)q->properties.write_ptr;
 
if (q->wptr_bo) {
-   wptr_addr_off = (uint64_t)q->properties.write_ptr - 
(uint64_t)q->wptr_bo->kfd_bo->va;
+   wptr_addr_off = (uint64_t)q->properties.write_ptr & (PAGE_SIZE 
- 1);
queue_input.wptr_mc_addr = 
((uint64_t)q->wptr_bo->tbo.resource->start << PAGE_SHIFT) + wptr_addr_off;
}
 
-- 
2.34.1



[PATCH] amd/amdkfd: Fix a memory limit issue

2022-11-14 Thread Eric Huang
It is to resolve a regression, which fails to allocate
VRAM due to no free memory in application, the reason
is we add check of vram_pin_size for memory limit, and
application is pinning the memory for Peerdirect, KFD
should not count it in memory limit. So removing
vram_pin_size will resolve it.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index db772942f7a6..fb1bb593312e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -172,9 +172,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
*adev,
(kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
 kfd_mem_limit.max_ttm_mem_limit) ||
(adev && adev->kfd.vram_used + vram_needed >
-adev->gmc.real_vram_size -
-atomic64_read(>vram_pin_size) -
-reserved_for_pt)) {
+adev->gmc.real_vram_size - reserved_for_pt)) {
ret = -ENOMEM;
goto release;
}
-- 
2.34.1



Re: [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory

2022-07-12 Thread Eric Huang

The patch has been pushed. I will do that for future patches.

Thanks,
Eric

On 2022-07-12 09:57, Deucher, Alexander wrote:


[AMD Official Use Only - General]


Can you please include a link to the proposed userspace in the commit 
message when you commit this?


Alex

*From:* amd-gfx  on behalf of 
Eric Huang 

*Sent:* Monday, July 11, 2022 2:41 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Huang, JinHuiEric ; Kuehling, Felix 

*Subject:* [PATCH] drm/amdkfd: bump KFD version for unified ctx 
save/restore memory

To expose unified memory for ctx save/resotre area feature
availablity to libhsakmt.

Signed-off-by: Eric Huang 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h 
b/include/uapi/linux/kfd_ioctl.h

index 7a423855a86e..afd8ff29c74f 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -36,9 +36,10 @@
  * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs
  * - 1.9 - Add available memory ioctl
  * - 1.10 - Add SMI profiler event log
+ * - 1.11 - Add unified memory for ctx save/restore area
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 10
+#define KFD_IOCTL_MINOR_VERSION 11

 struct kfd_ioctl_get_version_args {
 __u32 major_version;    /* from KFD */
--
2.25.1



[PATCH 3/3] libhsakmt: allocate unified memory for ctx save restore area

2022-07-11 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r
 area in VRAM instead of system memory, and migrate it back
 to system memory when VRAM is full.

Signed-off-by: Eric Huang 
Change-Id: If775782027188dbe84b6868260e429373675434c
---
 include/hsakmttypes.h |   1 +
 src/queues.c  | 109 --
 2 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index 690e001..65f23de 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -1331,6 +1331,7 @@ typedef enum _HSA_SVM_FLAGS {
HSA_SVM_FLAG_GPU_RO  = 0x0008, // GPUs only read, allows 
replication
HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU
HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may 
allow similar optimizations as RO, but writes fault
+   HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping 
always valid as if XNACK is disable
 } HSA_SVM_FLAGS;
 
 typedef enum _HSA_SVM_ATTR_TYPE {
diff --git a/src/queues.c b/src/queues.c
index d38ea0c..5702c95 100644
--- a/src/queues.c
+++ b/src/queues.c
@@ -68,6 +68,7 @@ struct queue {
uint32_t eop_buffer_size;
uint32_t gfxv;
bool use_ats;
+   bool unified_ctx_save_restore;
/* This queue structure is allocated from GPU with page aligned size
 * but only small bytes are used. We use the extra space in the end for
 * cu_mask bits array.
@@ -384,13 +385,49 @@ static void free_exec_aligned_memory(void *addr, uint32_t 
size, uint32_t align,
munmap(addr, size);
 }
 
+static HSAKMT_STATUS register_svm_range(void *mem, uint32_t size,
+   uint32_t gpuNode, uint32_t prefetchNode,
+   uint32_t preferredNode, bool alwaysMapped)
+{
+   HSA_SVM_ATTRIBUTE *attrs;
+   HSAuint64 s_attr;
+   HSAuint32 nattr;
+   HSAuint32 flags;
+
+   flags = HSA_SVM_FLAG_HOST_ACCESS;
+
+   if (alwaysMapped) {
+   CHECK_KFD_MINOR_VERSION(11);
+   flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED;
+   }
+
+   nattr = 5;
+   s_attr = sizeof(*attrs) * nattr;
+   attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr);
+
+   attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC;
+   attrs[0].value = prefetchNode;
+   attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC;
+   attrs[1].value = preferredNode;
+   attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS;
+   attrs[2].value = ~flags;
+   attrs[3].type = HSA_SVM_ATTR_SET_FLAGS;
+   attrs[3].value = flags;
+   attrs[4].type = HSA_SVM_ATTR_ACCESS;
+   attrs[4].value = gpuNode;
+
+   return hsaKmtSVMSetAttr(mem, size, nattr, attrs);
+}
+
 static void free_queue(struct queue *q)
 {
if (q->eop_buffer)
free_exec_aligned_memory(q->eop_buffer,
 q->eop_buffer_size,
 PAGE_SIZE, q->use_ats);
-   if (q->ctx_save_restore)
+   if (q->unified_ctx_save_restore)
+   free(q->ctx_save_restore);
+   else if (q->ctx_save_restore)
free_exec_aligned_memory(q->ctx_save_restore,
 q->ctx_save_restore_size,
 PAGE_SIZE, q->use_ats);
@@ -398,6 +435,20 @@ static void free_queue(struct queue *q)
free_exec_aligned_memory((void *)q, sizeof(*q), PAGE_SIZE, q->use_ats);
 }
 
+static inline void fill_cwsr_header(struct queue *q, void *addr,
+   HsaEvent *Event, volatile HSAint64 *ErrPayload)
+{
+   HsaUserContextSaveAreaHeader *header =
+   (HsaUserContextSaveAreaHeader *)addr;
+
+   header->ErrorEventId = 0;
+   if (Event)
+   header->ErrorEventId = Event->EventId;
+   header->ErrorReason = ErrPayload;
+   header->DebugOffset = q->ctx_save_restore_size;
+   header->DebugSize = q->debug_memory_size;
+}
+
 static int handle_concrete_asic(struct queue *q,
struct kfd_ioctl_create_queue_args *args,
uint32_t NodeId,
@@ -425,7 +476,8 @@ static int handle_concrete_asic(struct queue *q,
 
if (ret) {
uint32_t total_mem_alloc_size = 0;
-   HsaUserContextSaveAreaHeader *header;
+   HsaNodeProperties node;
+   bool svm_api;
 
args->ctx_save_restore_size = q->ctx_save_restore_size;
args->ctl_stack_size = q->ctl_stack_size;
@@ -435,22 +487,49 @@ static int handle_concrete_asic(struct queue *q,
 */
total_mem_alloc_size = q->ctx_save_restore_size +
   q->debug_memory_size;
-   q->ctx_save_restore =
-   allocate_exec_aligned_memory(total_mem_allo

[PATCH 2/3] libhsakmt: add new flag for svm

2022-07-11 Thread Eric Huang
It is to add new option for always keeping gpu mapping
and bump KFD version for the feature of unified save
restore memory.

Signed-off-by: Eric Huang 
Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492
---
 include/linux/kfd_ioctl.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h
index ba8de4b..4898451 100644
--- a/include/linux/kfd_ioctl.h
+++ b/include/linux/kfd_ioctl.h
@@ -35,9 +35,11 @@
  * - 1.7 - Checkpoint Restore (CRIU) API
  * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs
  * - 1.9 - Add available_memory ioctl
+ * - 1.10 - Add SMI profiler event log
+ * - 1.11 - Add unified memory for ctx save/restore area
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 9
+#define KFD_IOCTL_MINOR_VERSION 11
 
 /*
  * Debug revision change log
@@ -1080,6 +1082,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



[PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory

2022-07-11 Thread Eric Huang
To expose unified memory for ctx save/resotre area feature
availablity to libhsakmt.

Signed-off-by: Eric Huang 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 7a423855a86e..afd8ff29c74f 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -36,9 +36,10 @@
  * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs
  * - 1.9 - Add available memory ioctl
  * - 1.10 - Add SMI profiler event log
+ * - 1.11 - Add unified memory for ctx save/restore area
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 10
+#define KFD_IOCTL_MINOR_VERSION 11
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH 5/5] libhsakmt: allocate unified memory for ctx save restore area

2022-06-30 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r
 area in VRAM instead of system memory, and migrate it back
 to system memory when VRAM is full.

Signed-off-by: Eric Huang 
Change-Id: If775782027188dbe84b6868260e429373675434c
---
 include/hsakmttypes.h |   1 +
 src/queues.c  | 103 --
 2 files changed, 90 insertions(+), 14 deletions(-)

diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index 9063f85..2c1c7cc 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS {
HSA_SVM_FLAG_GPU_RO  = 0x0008, // GPUs only read, allows 
replication
HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU
HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may 
allow similar optimizations as RO, but writes fault
+   HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping 
always valid as if XNACK is disable
 } HSA_SVM_FLAGS;
 
 typedef enum _HSA_SVM_ATTR_TYPE {
diff --git a/src/queues.c b/src/queues.c
index c83dd93..d5109f9 100644
--- a/src/queues.c
+++ b/src/queues.c
@@ -68,6 +68,7 @@ struct queue {
uint32_t eop_buffer_size;
uint32_t gfxv;
bool use_ats;
+   bool unified_ctx_save_restore;
/* This queue structure is allocated from GPU with page aligned size
 * but only small bytes are used. We use the extra space in the end for
 * cu_mask bits array.
@@ -383,13 +384,47 @@ static void free_exec_aligned_memory(void *addr, uint32_t 
size, uint32_t align,
munmap(addr, size);
 }
 
+static HSAKMT_STATUS register_svm_range(void *mem, uint32_t size,
+   uint32_t gpuNode, uint32_t prefetchNode,
+   uint32_t preferredNode, bool alwaysMapped)
+{
+   HSA_SVM_ATTRIBUTE *attrs;
+   HSAuint64 s_attr;
+   HSAuint32 nattr;
+   HSAuint32 flags;
+
+   flags = HSA_SVM_FLAG_HOST_ACCESS;
+
+   if (alwaysMapped)
+   flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED;
+
+   nattr = 5;
+   s_attr = sizeof(*attrs) * nattr;
+   attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr);
+
+   attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC;
+   attrs[0].value = prefetchNode;
+   attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC;
+   attrs[1].value = preferredNode;
+   attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS;
+   attrs[2].value = ~flags;
+   attrs[3].type = HSA_SVM_ATTR_SET_FLAGS;
+   attrs[3].value = flags;
+   attrs[4].type = HSA_SVM_ATTR_ACCESS;
+   attrs[4].value = gpuNode;
+
+   return hsaKmtSVMSetAttr(mem, size, nattr, attrs);
+}
+
 static void free_queue(struct queue *q)
 {
if (q->eop_buffer)
free_exec_aligned_memory(q->eop_buffer,
 q->eop_buffer_size,
 PAGE_SIZE, q->use_ats);
-   if (q->ctx_save_restore)
+   if (q->unified_ctx_save_restore)
+   free(q->ctx_save_restore);
+   else if (q->ctx_save_restore)
free_exec_aligned_memory(q->ctx_save_restore,
 q->ctx_save_restore_size,
 PAGE_SIZE, q->use_ats);
@@ -425,6 +460,8 @@ static int handle_concrete_asic(struct queue *q,
if (ret) {
uint32_t total_mem_alloc_size = 0;
HsaUserContextSaveAreaHeader *header;
+   HsaNodeProperties node;
+   bool svm_api;
 
args->ctx_save_restore_size = q->ctx_save_restore_size;
args->ctl_stack_size = q->ctl_stack_size;
@@ -434,22 +471,60 @@ static int handle_concrete_asic(struct queue *q,
 */
total_mem_alloc_size = q->ctx_save_restore_size +
   q->debug_memory_size;
-   q->ctx_save_restore =
-   allocate_exec_aligned_memory(total_mem_alloc_size,
-q->use_ats, NodeId, false, false);
 
-   if (!q->ctx_save_restore)
-   return HSAKMT_STATUS_NO_MEMORY;
+   if (hsaKmtGetNodeProperties(NodeId, ))
+   svm_api = false;
+   else
+   svm_api = node.Capability.ui32.SVMAPISupported;
 
-   args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore;
+   /* Allocate unified memory for context save restore
+* area on dGPU.
+*/
+   if (!q->use_ats && svm_api) {
+   uint32_t size = PAGE_ALIGN_UP(total_mem_alloc_size);
+   void *addr;
+   HSAKMT_STATUS r = HSAKMT_STATUS_ERROR;
+
+   if (posix_memalign(, GPU_HUGE_PAGE_SIZE, size))
+   

[PATCH 4/5] libhsakmt: add new flags for svm

2022-06-30 Thread Eric Huang
It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang 
Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492
---
 include/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h
index 8a0ed49..5c45f58 100644
--- a/include/linux/kfd_ioctl.h
+++ b/include/linux/kfd_ioctl.h
@@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



[PATCH 1/5] drm/amdkfd: add new flag for svm

2022-06-30 Thread Eric Huang
It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang 
---
 include/uapi/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index fd49dde4d5f4..eba04ebfd9a8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



[PATCH 3/5] drm/amdkfd: optimize svm range evict

2022-06-30 Thread Eric Huang
It is to avoid unnecessary queue eviction when range
is not mapped to gpu.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 586bef4fcc8a..1f1f8f2dfa28 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1775,8 +1775,12 @@ svm_range_evict(struct svm_range *prange, struct 
mm_struct *mm,
if (!p->xnack_enabled ||
(prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
int evicted_ranges;
+   bool mapped = prange->mapped_to_gpu;
 
list_for_each_entry(pchild, >child_list, child_list) {
+   if (!pchild->mapped_to_gpu)
+   continue;
+   mapped = true;
mutex_lock_nested(>lock, 1);
if (pchild->start <= last && pchild->last >= start) {
pr_debug("increment pchild invalid [0x%lx 
0x%lx]\n",
@@ -1786,6 +1790,9 @@ svm_range_evict(struct svm_range *prange, struct 
mm_struct *mm,
mutex_unlock(>lock);
}
 
+   if (!mapped)
+   return r;
+
if (prange->start <= last && prange->last >= start)
atomic_inc(>invalid);
 
-- 
2.25.1



[PATCH 2/5] drm/amdkfd: change svm range evict

2022-06-30 Thread Eric Huang
Adding always evict queues when flag is set to
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED as if XNACK off.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4bf2f75f853b..586bef4fcc8a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1772,7 +1772,8 @@ svm_range_evict(struct svm_range *prange, struct 
mm_struct *mm,
pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n",
 svms, prange->start, prange->last, start, last);
 
-   if (!p->xnack_enabled) {
+   if (!p->xnack_enabled ||
+   (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
int evicted_ranges;
 
list_for_each_entry(pchild, >child_list, child_list) {
@@ -3321,7 +3322,8 @@ svm_range_set_attr(struct kfd_process *p, struct 
mm_struct *mm,
if (r)
goto out_unlock_range;
 
-   if (migrated && !p->xnack_enabled) {
+   if (migrated && (!p->xnack_enabled ||
+   (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED))) {
pr_debug("restore_work will update mappings of GPUs\n");
mutex_unlock(>migrate_mutex);
continue;
-- 
2.25.1



[PATCH 0/5] Unified memory for CWSR save restore area

2022-06-30 Thread Eric Huang
amdkfd changes:

Eric Huang (3):
  drm/amdkfd: add new flag for svm
  drm/amdkfd: change svm range evict
  drm/amdkfd: optimize svm range evict

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 +++--
 include/uapi/linux/kfd_ioctl.h   |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

libhsakmt(thunk) changes:
which are based on https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Eric Huang (2):
  libhsakmt: add new flags for svm
  libhsakmt: allocate unified memory for ctx save restore area

 include/hsakmttypes.h |   1 +
 include/linux/kfd_ioctl.h |   2 +
 src/queues.c  | 109 +-
 3 files changed, 98 insertions(+), 14 deletions(-)

-- 
2.25.1



Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-30 Thread Eric Huang



On 2022-06-29 19:29, Felix Kuehling wrote:

On 2022-06-29 18:53, Eric Huang wrote:



On 2022-06-29 18:20, Felix Kuehling wrote:

On 2022-06-28 17:43, Eric Huang wrote:

Two changes:
1. reducing unnecessary evict/unmap when range is not mapped to gpu.
2. adding always evict when flags is set to always_mapped.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 4bf2f75f853b..76e817687ef9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
struct mm_struct *mm,

  struct kfd_process *p;
  int r = 0;
  +    if (!prange->mapped_to_gpu)
+    return 0;


This feels like an unrelated optimization that should be in a 
separate patch.


But I'm not sure this is correct, because it doesn't consider child 
ranges. svm_range_unmap_from_gpus already contains this check, so 
ranges should not be unmapped unnecessarily either way. Is there any 
other benefit to this change that I'm missing?

I will send another patch separately that considers child ranges.


I think this should only be done in the XNACK-off case. For XNACK-on 
it's already handled in the svm_range_unmap_from_gpus.

Yes and It is also done when KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED is set.



The benefit is it will reduce unnecessary queue evicts when 
allocating context save memory, which is unmapped to gpu.


That sounds wrong. The context save area should never be unmapped from 
GPU. That's the whole point of setting the 
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening 
while migrating the context save area to VRAM for the first time, even 
before it's mapped to GPU?
Yes. It is for the first time when registering svm range and migrating 
to VRAM are doing together, at this moment, the range is not mapped to GPU.


I think there may be another eviction, when the CWSR header is 
initialized by the CPU. That would also migrate it back to system 
memory. To avoid that, you should probably register the context save 
area only after the header has been initialized.

Yes. I am using this way. Please look at patch 4/4.


I think avoiding an eviction when memory is migrated when it is first 
registered is worthwhile, not just for CWSR.



It is for efficiency reason. On the other hand, without this 
optimization KFDCWSRTest.InterruptRestore fails with queue preemption 
error.


What do you mean by "queue preemption error"? Does HWS hang?
HWS doesn't hang immediately, so there is not error for fence timeout 
"The cp might be in an unrecoverable state due to an unsuccessful queues 
preemption". The error is "HIQ MQD's queue_doorbell_id0 is not 0, Queue 
preemption time out" after checking mqd manager, which means HWS 
abandons unmap queue request without returning timeout error to driver. 
And after this error, the following test will fail at queue creation as 
HWS hangs



I think the reason is the extra queue evicts make HWS too busy to 
preempt existing queues. There is one unmap_queue packet sent to HWS 
in current code, and will be three unmap_queue packets with unified 
memory allocation.


When queues of a process are already evicted, they should not get 
evicted again. That's handled by the qpd->evicted counter. There 
should never be multiple unmap_queues packets in flight at the same 
time. If you're seeing three unmap_queues, you should also see queues 
restored three times.


HWS should never be too busy to evict queues. If you're seeing 
preemptions fail, you may have found a bug.
The restore delay worker will do something differently in term of 
timing. It could restore queues before next unmap_queues, so the 
situation is too complicate to debug in multiple queues evict/restore 
environment. The error definitely means there is a bug, from driver 
point of view there is nothing wrong even adding extra queue eviction, 
so I try to avoid extra queue eviction and keep it as before. The bottom 
line is to make sure unified svm range for context save area doesn't 
cause any failure in kfdtest, so I can theoretically assume extra queue 
eviction/restoring causes HWS failure.


Regards,
Eric


Regards,
  Felix



So this optimization will keep only one unmap_queue as before.

Regards,
Eric


Regards,
  Felix



+
  p = container_of(svms, struct kfd_process, svms);
    pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
0x%lx]\n",

   svms, prange->start, prange->last, start, last);
  -    if (!p->xnack_enabled) {
+    if (!p->xnack_enabled ||
+    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
  int evicted_ranges;
    list_for_each_entry(pchild, >child_list, 
child_list) {
@@ -3321,7 +3325,9 @@ svm_range_set

Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-29 Thread Eric Huang




On 2022-06-29 18:20, Felix Kuehling wrote:

On 2022-06-28 17:43, Eric Huang wrote:

Two changes:
1. reducing unnecessary evict/unmap when range is not mapped to gpu.
2. adding always evict when flags is set to always_mapped.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 4bf2f75f853b..76e817687ef9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
struct mm_struct *mm,

  struct kfd_process *p;
  int r = 0;
  +    if (!prange->mapped_to_gpu)
+    return 0;


This feels like an unrelated optimization that should be in a separate 
patch.


But I'm not sure this is correct, because it doesn't consider child 
ranges. svm_range_unmap_from_gpus already contains this check, so 
ranges should not be unmapped unnecessarily either way. Is there any 
other benefit to this change that I'm missing?
I will send another patch separately that considers child ranges. The 
benefit is it will reduce unnecessary queue evicts when allocating 
context save memory, which is unmapped to gpu. It is for efficiency 
reason. On the other hand, without this optimization 
KFDCWSRTest.InterruptRestore fails with queue preemption error. I think 
the reason is the extra queue evicts make HWS too busy to preempt 
existing queues. There is one unmap_queue packet sent to HWS in current 
code, and will be three unmap_queue packets with unified memory 
allocation. So this optimization will keep only one unmap_queue as before.


Regards,
Eric


Regards,
  Felix



+
  p = container_of(svms, struct kfd_process, svms);
    pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
0x%lx]\n",

   svms, prange->start, prange->last, start, last);
  -    if (!p->xnack_enabled) {
+    if (!p->xnack_enabled ||
+    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
  int evicted_ranges;
    list_for_each_entry(pchild, >child_list, 
child_list) {
@@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
struct mm_struct *mm,

  if (r)
  goto out_unlock_range;
  -    if (migrated && !p->xnack_enabled) {
+    if (migrated && (!p->xnack_enabled ||
+    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
+    prange->mapped_to_gpu) {
  pr_debug("restore_work will update mappings of GPUs\n");
  mutex_unlock(>migrate_mutex);
  continue;




[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area

2022-06-28 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r
 area in VRAM instead of system memory, and migrate it back
 to system memory when VRAM is full.

Signed-off-by: Eric Huang 
Change-Id: If775782027188dbe84b6868260e429373675434c
---
 include/hsakmttypes.h |   1 +
 src/queues.c  | 109 --
 2 files changed, 96 insertions(+), 14 deletions(-)

diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index 9063f85..2c1c7cc 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS {
HSA_SVM_FLAG_GPU_RO  = 0x0008, // GPUs only read, allows 
replication
HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU
HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may 
allow similar optimizations as RO, but writes fault
+   HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping 
always valid as if XNACK is disable
 } HSA_SVM_FLAGS;
 
 typedef enum _HSA_SVM_ATTR_TYPE {
diff --git a/src/queues.c b/src/queues.c
index c83dd93..e65103d 100644
--- a/src/queues.c
+++ b/src/queues.c
@@ -68,6 +68,7 @@ struct queue {
uint32_t eop_buffer_size;
uint32_t gfxv;
bool use_ats;
+   bool unified_ctx_save_restore;
/* This queue structure is allocated from GPU with page aligned size
 * but only small bytes are used. We use the extra space in the end for
 * cu_mask bits array.
@@ -383,13 +384,50 @@ static void free_exec_aligned_memory(void *addr, uint32_t 
size, uint32_t align,
munmap(addr, size);
 }
 
+static HSAKMT_STATUS register_exec_svm_range(void *mem, uint32_t size,
+   uint32_t gpuNode, uint32_t prefetchNode,
+   uint32_t preferredNode, bool alwaysMapped)
+{
+   HSA_SVM_ATTRIBUTE *attrs;
+   HSAuint64 s_attr;
+   HSAuint32 nattr;
+   HSAuint32 flags;
+
+   flags = HSA_SVM_FLAG_HOST_ACCESS |
+   HSA_SVM_FLAG_GPU_EXEC;
+
+   if (alwaysMapped)
+   flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED;
+
+   nattr = 5;
+   s_attr = sizeof(*attrs) * nattr;
+   attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr);
+
+   attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC;
+   attrs[0].value = prefetchNode;
+   attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC;
+   attrs[1].value = preferredNode;
+   attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS;
+   attrs[2].value = flags;
+   attrs[3].type = HSA_SVM_ATTR_SET_FLAGS;
+   attrs[3].value = flags;
+   attrs[4].type = HSA_SVM_ATTR_ACCESS;
+   attrs[4].value = gpuNode;
+
+   return hsaKmtSVMSetAttr(mem, size, nattr, attrs);
+}
+
 static void free_queue(struct queue *q)
 {
if (q->eop_buffer)
free_exec_aligned_memory(q->eop_buffer,
 q->eop_buffer_size,
 PAGE_SIZE, q->use_ats);
-   if (q->ctx_save_restore)
+   if (q->unified_ctx_save_restore)
+   munmap(q->ctx_save_restore,
+   ALIGN_UP(q->ctx_save_restore_size + 
q->debug_memory_size,
+   PAGE_SIZE));
+   else if (q->ctx_save_restore)
free_exec_aligned_memory(q->ctx_save_restore,
 q->ctx_save_restore_size,
 PAGE_SIZE, q->use_ats);
@@ -425,6 +463,8 @@ static int handle_concrete_asic(struct queue *q,
if (ret) {
uint32_t total_mem_alloc_size = 0;
HsaUserContextSaveAreaHeader *header;
+   HsaNodeProperties node;
+   bool svm_api;
 
args->ctx_save_restore_size = q->ctx_save_restore_size;
args->ctl_stack_size = q->ctl_stack_size;
@@ -434,22 +474,63 @@ static int handle_concrete_asic(struct queue *q,
 */
total_mem_alloc_size = q->ctx_save_restore_size +
   q->debug_memory_size;
-   q->ctx_save_restore =
-   allocate_exec_aligned_memory(total_mem_alloc_size,
-q->use_ats, NodeId, false, false);
 
-   if (!q->ctx_save_restore)
-   return HSAKMT_STATUS_NO_MEMORY;
+   if (hsaKmtGetNodeProperties(NodeId, ))
+   svm_api = false;
+   else
+   svm_api = node.Capability.ui32.SVMAPISupported;
 
-   args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore;
+   /* Allocate unified memory for context save restore
+* area on dGPU.
+*/
+   if (!q->use_ats && svm_api) {
+   uint32_t size = ALIGN_UP(total_mem_alloc_size, 
PAGE

[PATCH 3/4] libhsakmt: add new flags for svm

2022-06-28 Thread Eric Huang
It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang 
Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492
---
 include/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h
index 8a0ed49..5c45f58 100644
--- a/include/linux/kfd_ioctl.h
+++ b/include/linux/kfd_ioctl.h
@@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area

2022-06-28 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r
 area in VRAM instead of system memory, and migrate it back
 to system memory when VRAM is full.

Signed-off-by: Eric Huang 
Change-Id: If775782027188dbe84b6868260e429373675434c
---
 include/hsakmttypes.h |   1 +
 src/queues.c  | 109 --
 2 files changed, 96 insertions(+), 14 deletions(-)

diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index 9063f85..2c1c7cc 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS {
HSA_SVM_FLAG_GPU_RO  = 0x0008, // GPUs only read, allows 
replication
HSA_SVM_FLAG_GPU_EXEC= 0x0010, // Allow execution on GPU
HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x0020, // GPUs mostly read, may 
allow similar optimizations as RO, but writes fault
+   HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x0040, // Keep GPU memory mapping 
always valid as if XNACK is disable
 } HSA_SVM_FLAGS;
 
 typedef enum _HSA_SVM_ATTR_TYPE {
diff --git a/src/queues.c b/src/queues.c
index c83dd93..e65103d 100644
--- a/src/queues.c
+++ b/src/queues.c
@@ -68,6 +68,7 @@ struct queue {
uint32_t eop_buffer_size;
uint32_t gfxv;
bool use_ats;
+   bool unified_ctx_save_restore;
/* This queue structure is allocated from GPU with page aligned size
 * but only small bytes are used. We use the extra space in the end for
 * cu_mask bits array.
@@ -383,13 +384,50 @@ static void free_exec_aligned_memory(void *addr, uint32_t 
size, uint32_t align,
munmap(addr, size);
 }
 
+static HSAKMT_STATUS register_exec_svm_range(void *mem, uint32_t size,
+   uint32_t gpuNode, uint32_t prefetchNode,
+   uint32_t preferredNode, bool alwaysMapped)
+{
+   HSA_SVM_ATTRIBUTE *attrs;
+   HSAuint64 s_attr;
+   HSAuint32 nattr;
+   HSAuint32 flags;
+
+   flags = HSA_SVM_FLAG_HOST_ACCESS |
+   HSA_SVM_FLAG_GPU_EXEC;
+
+   if (alwaysMapped)
+   flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED;
+
+   nattr = 5;
+   s_attr = sizeof(*attrs) * nattr;
+   attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr);
+
+   attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC;
+   attrs[0].value = prefetchNode;
+   attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC;
+   attrs[1].value = preferredNode;
+   attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS;
+   attrs[2].value = flags;
+   attrs[3].type = HSA_SVM_ATTR_SET_FLAGS;
+   attrs[3].value = flags;
+   attrs[4].type = HSA_SVM_ATTR_ACCESS;
+   attrs[4].value = gpuNode;
+
+   return hsaKmtSVMSetAttr(mem, size, nattr, attrs);
+}
+
 static void free_queue(struct queue *q)
 {
if (q->eop_buffer)
free_exec_aligned_memory(q->eop_buffer,
 q->eop_buffer_size,
 PAGE_SIZE, q->use_ats);
-   if (q->ctx_save_restore)
+   if (q->unified_ctx_save_restore)
+   munmap(q->ctx_save_restore,
+   ALIGN_UP(q->ctx_save_restore_size + 
q->debug_memory_size,
+   PAGE_SIZE));
+   else if (q->ctx_save_restore)
free_exec_aligned_memory(q->ctx_save_restore,
 q->ctx_save_restore_size,
 PAGE_SIZE, q->use_ats);
@@ -425,6 +463,8 @@ static int handle_concrete_asic(struct queue *q,
if (ret) {
uint32_t total_mem_alloc_size = 0;
HsaUserContextSaveAreaHeader *header;
+   HsaNodeProperties node;
+   bool svm_api;
 
args->ctx_save_restore_size = q->ctx_save_restore_size;
args->ctl_stack_size = q->ctl_stack_size;
@@ -434,22 +474,63 @@ static int handle_concrete_asic(struct queue *q,
 */
total_mem_alloc_size = q->ctx_save_restore_size +
   q->debug_memory_size;
-   q->ctx_save_restore =
-   allocate_exec_aligned_memory(total_mem_alloc_size,
-q->use_ats, NodeId, false, false);
 
-   if (!q->ctx_save_restore)
-   return HSAKMT_STATUS_NO_MEMORY;
+   if (hsaKmtGetNodeProperties(NodeId, ))
+   svm_api = false;
+   else
+   svm_api = node.Capability.ui32.SVMAPISupported;
 
-   args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore;
+   /* Allocate unified memory for context save restore
+* area on dGPU.
+*/
+   if (!q->use_ats && svm_api) {
+   uint32_t size = ALIGN_UP(total_mem_alloc_size, 
PAGE

[PATCH 0/4] Unified memory for CWSR save restore area

2022-06-28 Thread Eric Huang
amdkfd changes:

Eric Huang (2):
  drm/amdkfd: add new flag for svm
  drm/amdkfd: change svm range evict

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 --
 include/uapi/linux/kfd_ioctl.h   |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

libhsakmt(thunk) changes:
which are based on https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Eric Huang (2):
  libhsakmt: add new flags for svm
  libhsakmt: allocate unified memory for ctx save restore area

 include/hsakmttypes.h |   1 +
 include/linux/kfd_ioctl.h |   2 +
 src/queues.c  | 109 +-
 3 files changed, 98 insertions(+), 14 deletions(-)

-- 
2.25.1



[PATCH 3/4] libhsakmt: add new flags for svm

2022-06-28 Thread Eric Huang
It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang 
Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492
---
 include/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h
index 8a0ed49..5c45f58 100644
--- a/include/linux/kfd_ioctl.h
+++ b/include/linux/kfd_ioctl.h
@@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



[PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-28 Thread Eric Huang
Two changes:
1. reducing unnecessary evict/unmap when range is not mapped to gpu.
2. adding always evict when flags is set to always_mapped.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4bf2f75f853b..76e817687ef9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct 
mm_struct *mm,
struct kfd_process *p;
int r = 0;
 
+   if (!prange->mapped_to_gpu)
+   return 0;
+
p = container_of(svms, struct kfd_process, svms);
 
pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n",
 svms, prange->start, prange->last, start, last);
 
-   if (!p->xnack_enabled) {
+   if (!p->xnack_enabled ||
+   (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
int evicted_ranges;
 
list_for_each_entry(pchild, >child_list, child_list) {
@@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct 
mm_struct *mm,
if (r)
goto out_unlock_range;
 
-   if (migrated && !p->xnack_enabled) {
+   if (migrated && (!p->xnack_enabled ||
+   (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
+   prange->mapped_to_gpu) {
pr_debug("restore_work will update mappings of GPUs\n");
mutex_unlock(>migrate_mutex);
continue;
-- 
2.25.1



[PATCH 1/2] drm/amdkfd: add new flag for svm

2022-06-28 Thread Eric Huang
It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang 
---
 include/uapi/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index fd49dde4d5f4..eba04ebfd9a8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



Re: [PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-28 Thread Eric Huang

Thank you, Felix.

I will send all libhsakmt changes and amdkfd changes to amd-gfx.

Regards,
Eric

On 2022-06-28 16:44, Felix Kuehling wrote:

Am 2022-06-27 um 12:01 schrieb Eric Huang:
No. There is only internal link for now, because it is under review. 
Once it is submitted, external link should be in gerritgit for 
libhsakmt.


Hi Eric,

For anything that requires ioctl API changes, the user mode and kernel 
mode changes need to be reviewed together in public. You can either 
post the libhsakmt change by email to amd-gfx, or you can push your 
libhsakmt development branch to a personal branch on github and 
include a link to that in the kernel commit description.


Alex, some background about this series: We are looking into using 
unified memory for CWSR context save space. This allows us to get 
lower preemption latency when VRAM is available, but migrate it to 
system memory when more VRAM is needed for application allocations. 
Because we cannot preempt in the trap handler, and we want to 
guarantee finite time for preemption and trap handler execution, we 
need to prevent page faults on any memory accessed by the trap 
handler. The KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag is meant to 
guarantee that.


I think the KFD_IOCTL_SVM_FLAG_CUSTOM is not necessary. I've responded 
to Eric with an alternative idea.


Regards,
  Felix




Regards,
Eric

On 2022-06-27 11:58, Alex Deucher wrote:
On Mon, Jun 27, 2022 at 11:36 AM Eric Huang 
 wrote:

http://gerrit-git.amd.com/c/compute/ec/libhsakmt/+/697296

Got an external link?

Alex


Regards,
Eric

On 2022-06-27 11:33, Alex Deucher wrote:
On Fri, Jun 24, 2022 at 12:03 PM Eric Huang 
 wrote:

It is to add new options for always keeping gpu mapping
and custom of coarse grain allocation intead of fine
grain as default.

Signed-off-by: Eric Huang 

Can you provide a link to the proposed userspace for this?

Alex


---
   include/uapi/linux/kfd_ioctl.h | 4 
   1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h 
b/include/uapi/linux/kfd_ioctl.h

index fd49dde4d5f4..9dbf215675a0 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args {
   #define KFD_IOCTL_SVM_FLAG_GPU_EXEC    0x0010
   /* GPUs mostly read, may allow similar optimizations as RO, 
but writes fault */

   #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED 0x0040
+/* Allow set custom flags instead of defaults */
+#define KFD_IOCTL_SVM_FLAG_CUSTOM  0x8000

   /**
    * kfd_ioctl_svm_op - SVM ioctl operations
--
2.25.1







Re: [PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-27 Thread Eric Huang
No. There is only internal link for now, because it is under review. 
Once it is submitted, external link should be in gerritgit for libhsakmt.


Regards,
Eric

On 2022-06-27 11:58, Alex Deucher wrote:

On Mon, Jun 27, 2022 at 11:36 AM Eric Huang  wrote:

https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgerrit-git.amd.com%2Fc%2Fcompute%2Fec%2Flibhsakmt%2F%2B%2F697296data=05%7C01%7Cjinhuieric.huang%40amd.com%7C61498029cd6743a4519108da5855f02e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637919423397667222%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=wPlHRSmOyzO%2B2vbwN9IK4qR5dhk%2BaOw2rt3JEdjizRU%3Dreserved=0

Got an external link?

Alex


Regards,
Eric

On 2022-06-27 11:33, Alex Deucher wrote:

On Fri, Jun 24, 2022 at 12:03 PM Eric Huang  wrote:

It is to add new options for always keeping gpu mapping
and custom of coarse grain allocation intead of fine
grain as default.

Signed-off-by: Eric Huang 

Can you provide a link to the proposed userspace for this?

Alex


---
   include/uapi/linux/kfd_ioctl.h | 4 
   1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index fd49dde4d5f4..9dbf215675a0 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args {
   #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
   /* GPUs mostly read, may allow similar optimizations as RO, but writes fault 
*/
   #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
+/* Allow set custom flags instead of defaults */
+#define KFD_IOCTL_SVM_FLAG_CUSTOM  0x8000

   /**
* kfd_ioctl_svm_op - SVM ioctl operations
--
2.25.1





Re: [PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-27 Thread Eric Huang

http://gerrit-git.amd.com/c/compute/ec/libhsakmt/+/697296

Regards,
Eric

On 2022-06-27 11:33, Alex Deucher wrote:

On Fri, Jun 24, 2022 at 12:03 PM Eric Huang  wrote:

It is to add new options for always keeping gpu mapping
and custom of coarse grain allocation intead of fine
grain as default.

Signed-off-by: Eric Huang 

Can you provide a link to the proposed userspace for this?

Alex


---
  include/uapi/linux/kfd_ioctl.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index fd49dde4d5f4..9dbf215675a0 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args {
  #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
  /* GPUs mostly read, may allow similar optimizations as RO, but writes fault 
*/
  #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
+/* Allow set custom flags instead of defaults */
+#define KFD_IOCTL_SVM_FLAG_CUSTOM  0x8000

  /**
   * kfd_ioctl_svm_op - SVM ioctl operations
--
2.25.1





[PATCH 3/3] drm/amdkfd: add custom svm range flags setting

2022-06-24 Thread Eric Huang
It is to give a chance for user to change default
flags setting, such as fine grain to coarse grain.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 353306037959..caadd18c447a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -722,7 +722,10 @@ svm_range_apply_attrs(struct kfd_process *p, struct 
svm_range *prange,
break;
case KFD_IOCTL_SVM_ATTR_SET_FLAGS:
*update_mapping = true;
-   prange->flags |= attrs[i].value;
+   if (attrs[i].value & KFD_IOCTL_SVM_FLAG_CUSTOM)
+   prange->flags = attrs[i].value;
+   else
+   prange->flags |= attrs[i].value;
break;
case KFD_IOCTL_SVM_ATTR_CLR_FLAGS:
*update_mapping = true;
-- 
2.25.1



[PATCH 2/3] drm/amdkfd: change svm range evict

2022-06-24 Thread Eric Huang
Two changes:
1. reducing unnecessary evict/unmap when range is not mapped to gpu.
2. adding always evict when flags is set to always_mapped.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4bf2f75f853b..353306037959 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct 
mm_struct *mm,
struct kfd_process *p;
int r = 0;
 
+   if (prange->mapped_to_gpu)
+   return 0;
+
p = container_of(svms, struct kfd_process, svms);
 
pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n",
 svms, prange->start, prange->last, start, last);
 
-   if (!p->xnack_enabled) {
+   if (!p->xnack_enabled ||
+   (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
int evicted_ranges;
 
list_for_each_entry(pchild, >child_list, child_list) {
@@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct 
mm_struct *mm,
if (r)
goto out_unlock_range;
 
-   if (migrated && !p->xnack_enabled) {
+   if (migrated && (!p->xnack_enabled ||
+   (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
+   prange->mapped_to_gpu) {
pr_debug("restore_work will update mappings of GPUs\n");
mutex_unlock(>migrate_mutex);
continue;
-- 
2.25.1



[PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-24 Thread Eric Huang
It is to add new options for always keeping gpu mapping
and custom of coarse grain allocation intead of fine
grain as default.

Signed-off-by: Eric Huang 
---
 include/uapi/linux/kfd_ioctl.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index fd49dde4d5f4..9dbf215675a0 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,10 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC0x0010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY 0x0020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x0040
+/* Allow set custom flags instead of defaults */
+#define KFD_IOCTL_SVM_FLAG_CUSTOM  0x8000
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1



Re: [PATCH 1/1] Revert "drm/amdkfd: Add queue to MES if it becomes active"

2022-06-17 Thread Eric Huang

Reviewed-by: Eric Huang 

On 2022-06-17 15:26, Philip Yang wrote:

This reverts commit 8b9aa1fa82baf4e8b6a2daa3aa4d69b728df727e.
As it breaks pqm_set_gws.
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 67ae5b6385a2..e1797657b04c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -866,10 +866,8 @@ static int update_queue(struct device_queue_manager *dqm, 
struct queue *q,
 * dqm->active_queue_count to determine whether a new runlist must be
 * uploaded.
 */
-   if (q->properties.is_active) {
-   add_queue = true;
-   if (!prev_active)
-   increment_queue_count(dqm, >qpd, q);
+   if (q->properties.is_active && !prev_active) {
+   increment_queue_count(dqm, >qpd, q);
} else if (!q->properties.is_active && prev_active) {
decrement_queue_count(dqm, >qpd, q);
} else if (q->gws && !q->properties.is_gws) {




Re: [PATCH 1/2] drm/amdkfd: Add queue to MES if it becomes active

2022-06-16 Thread Eric Huang

Does it break the case of q->gws with q->properties.is_active == true?

Regards,
Eric

On 2022-06-15 17:56, Philip Yang wrote:

We remove the user queue from MES scheduler to update queue properties.
If the queue becomes active after updating, add the user queue to MES
scheduler, to be able to handle command packet submission.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e1797657b04c..67ae5b6385a2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -866,8 +866,10 @@ static int update_queue(struct device_queue_manager *dqm, 
struct queue *q,
 * dqm->active_queue_count to determine whether a new runlist must be
 * uploaded.
 */
-   if (q->properties.is_active && !prev_active) {
-   increment_queue_count(dqm, >qpd, q);
+   if (q->properties.is_active) {
+   add_queue = true;
+   if (!prev_active)
+   increment_queue_count(dqm, >qpd, q);
} else if (!q->properties.is_active && prev_active) {
decrement_queue_count(dqm, >qpd, q);
} else if (q->gws && !q->properties.is_gws) {




[PATCH 1/2] drm/amdkfd: port cwsr trap handler from dkms branch

2022-05-17 Thread Eric Huang
Most of changes are for debugger feature, and it is
to simplify trap handler support for new asics in the
future.

Signed-off-by: Eric Huang 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2527 +
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  325 ++-
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  244 +-
 3 files changed, 1596 insertions(+), 1500 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 475f89700c74..8cbdc7f519c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -166,7 +166,7 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
0x807c847c, 0x806eff6e,
0x0400, 0xbf0a757c,
0xbf85ffef, 0xbf9c,
-   0xbf8200cd, 0xbef8007e,
+   0xbf8200ce, 0xbef8007e,
0x8679ff7f, 0x,
0x8779ff79, 0x0004,
0xbefa0080, 0xbefb00ff,
@@ -212,304 +212,310 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
0x761e, 0xe0524100,
0x761e0100, 0xe0524200,
0x761e0200, 0xe0524300,
-   0x761e0300, 0xb8f22a05,
-   0x80728172, 0x8e728a72,
-   0xb8f61605, 0x80768176,
-   0x8e768676, 0x80727672,
-   0x80f2c072, 0xb8f31605,
-   0x80738173, 0x8e738473,
-   0x8e7a8273, 0xbefa00ff,
-   0x0100, 0xbefc0073,
-   0xc031003c, 0x0072,
-   0x80f2c072, 0xbf8c007f,
-   0x80fc907c, 0xbe802d00,
-   0xbe822d02, 0xbe842d04,
-   0xbe862d06, 0xbe882d08,
-   0xbe8a2d0a, 0xbe8c2d0c,
-   0xbe8e2d0e, 0xbf06807c,
-   0xbf84fff1, 0xb8f22a05,
-   0x80728172, 0x8e728a72,
-   0xb8f61605, 0x80768176,
-   0x8e768676, 0x80727672,
-   0xbefa0084, 0xbefa00ff,
-   0x0100, 0xc0211cfc,
+   0x761e0300, 0xbf8c0f70,
+   0xb8f22a05, 0x80728172,
+   0x8e728a72, 0xb8f61605,
+   0x80768176, 0x8e768676,
+   0x80727672, 0x80f2c072,
+   0xb8f31605, 0x80738173,
+   0x8e738473, 0x8e7a8273,
+   0xbefa00ff, 0x0100,
+   0xbefc0073, 0xc031003c,
+   0x0072, 0x80f2c072,
+   0xbf8c007f, 0x80fc907c,
+   0xbe802d00, 0xbe822d02,
+   0xbe842d04, 0xbe862d06,
+   0xbe882d08, 0xbe8a2d0a,
+   0xbe8c2d0c, 0xbe8e2d0e,
+   0xbf06807c, 0xbf84fff1,
+   0xb8f22a05, 0x80728172,
+   0x8e728a72, 0xb8f61605,
+   0x80768176, 0x8e768676,
+   0x80727672, 0xbefa0084,
+   0xbefa00ff, 0x0100,
+   0xc0211cfc, 0x0072,
+   0x80728472, 0xc0211c3c,
0x0072, 0x80728472,
-   0xc0211c3c, 0x0072,
-   0x80728472, 0xc0211c7c,
+   0xc0211c7c, 0x0072,
+   0x80728472, 0xc0211bbc,
0x0072, 0x80728472,
-   0xc0211bbc, 0x0072,
-   0x80728472, 0xc0211bfc,
+   0xc0211bfc, 0x0072,
+   0x80728472, 0xc0211d3c,
0x0072, 0x80728472,
-   0xc0211d3c, 0x0072,
-   0x80728472, 0xc0211d7c,
+   0xc0211d7c, 0x0072,
+   0x80728472, 0xc0211a3c,
0x0072, 0x80728472,
-   0xc0211a3c, 0x0072,
-   0x80728472, 0xc0211a7c,
+   0xc0211a7c, 0x0072,
+   0x80728472, 0xc0211dfc,
0x0072, 0x80728472,
-   0xc0211dfc, 0x0072,
-   0x80728472, 0xc0211b3c,
+   0xc0211b3c, 0x0072,
+   0x80728472, 0xc0211b7c,
0x0072, 0x80728472,
-   0xc0211b7c, 0x0072,
-   0x80728472, 0xbf8c007f,
-   0xbefc0073, 0xbefe006e,
-   0xbeff006f, 0x867375ff,
-   0x03ff, 0xb9734803,
-   0x867375ff, 0xf800,
-   0x8f738b73, 0xb973a2c3,
-   0xb977f801, 0x8673ff71,
-   0xf000, 0x8f739c73,
-   0x8e739073, 0xbef60080,
-   0x87767376, 0x8673ff71,
-   0x0800, 0x8f739b73,
-   0x8e738f73, 0x87767376,
-   0x8673ff74, 0x0080,
-   0x8f739773, 0xb976f807,
-   0x8671ff71, 0x,
-   0x86fe7e7e, 0x86ea6a6a,
-   0x8f768374, 0xb976e0c2,
-   0xbf82, 0xb9740002,
-   0xbf8a, 0x95807370,
-   0xbf81, 0x,
+   0xbf8c007f, 0xbefc0073,
+   0xbefe006e, 0xbeff006f,
+   0x867375ff, 0x03ff,
+   0xb9734803, 0x867375ff,
+   0xf800, 0x8f738b73,
+   0xb973a2c3, 0xb977f801,
+   0x8673ff71, 0xf000,
+   0x8f739c73, 0x8e739073,
+   0xbef60080, 0x87767376,
+   0x8673ff71, 0x0800,
+   0x8f739b73, 0x8e738f73,
+   0x87767376, 0x8673ff74,
+   0x0080, 0x8f739773,
+   0xb976f807, 0x8671ff71,
+   0x, 0x86fe7e7e,
+   0x86ea6a6a, 0x8f768374,
+   0xb976e0c2, 0xbf82,
+   0xb9740002, 0xbf8a,
+   0x95807370, 0xbf81,
 };
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820248,
-   0xb8f8f802, 0x89788678,
-   0xb8eef801, 0x866eff6e,
-   0x0800, 0xbf840003,
+   0xbf820001, 0xbf820254,
+   0xb8f8f802, 0x8978ff78,
+   0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
-   0xbf840016, 0xb8fbf803

[PATCH 1/2] drm/amdkfd: port cwsr trap handler from dkms branch

2022-05-16 Thread Eric Huang
It is to simplify trap handler support for new asics in
the future.

Signed-off-by: Eric Huang 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2527 +
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  325 ++-
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  244 +-
 3 files changed, 1596 insertions(+), 1500 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 475f89700c74..8cbdc7f519c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -166,7 +166,7 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
0x807c847c, 0x806eff6e,
0x0400, 0xbf0a757c,
0xbf85ffef, 0xbf9c,
-   0xbf8200cd, 0xbef8007e,
+   0xbf8200ce, 0xbef8007e,
0x8679ff7f, 0x,
0x8779ff79, 0x0004,
0xbefa0080, 0xbefb00ff,
@@ -212,304 +212,310 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
0x761e, 0xe0524100,
0x761e0100, 0xe0524200,
0x761e0200, 0xe0524300,
-   0x761e0300, 0xb8f22a05,
-   0x80728172, 0x8e728a72,
-   0xb8f61605, 0x80768176,
-   0x8e768676, 0x80727672,
-   0x80f2c072, 0xb8f31605,
-   0x80738173, 0x8e738473,
-   0x8e7a8273, 0xbefa00ff,
-   0x0100, 0xbefc0073,
-   0xc031003c, 0x0072,
-   0x80f2c072, 0xbf8c007f,
-   0x80fc907c, 0xbe802d00,
-   0xbe822d02, 0xbe842d04,
-   0xbe862d06, 0xbe882d08,
-   0xbe8a2d0a, 0xbe8c2d0c,
-   0xbe8e2d0e, 0xbf06807c,
-   0xbf84fff1, 0xb8f22a05,
-   0x80728172, 0x8e728a72,
-   0xb8f61605, 0x80768176,
-   0x8e768676, 0x80727672,
-   0xbefa0084, 0xbefa00ff,
-   0x0100, 0xc0211cfc,
+   0x761e0300, 0xbf8c0f70,
+   0xb8f22a05, 0x80728172,
+   0x8e728a72, 0xb8f61605,
+   0x80768176, 0x8e768676,
+   0x80727672, 0x80f2c072,
+   0xb8f31605, 0x80738173,
+   0x8e738473, 0x8e7a8273,
+   0xbefa00ff, 0x0100,
+   0xbefc0073, 0xc031003c,
+   0x0072, 0x80f2c072,
+   0xbf8c007f, 0x80fc907c,
+   0xbe802d00, 0xbe822d02,
+   0xbe842d04, 0xbe862d06,
+   0xbe882d08, 0xbe8a2d0a,
+   0xbe8c2d0c, 0xbe8e2d0e,
+   0xbf06807c, 0xbf84fff1,
+   0xb8f22a05, 0x80728172,
+   0x8e728a72, 0xb8f61605,
+   0x80768176, 0x8e768676,
+   0x80727672, 0xbefa0084,
+   0xbefa00ff, 0x0100,
+   0xc0211cfc, 0x0072,
+   0x80728472, 0xc0211c3c,
0x0072, 0x80728472,
-   0xc0211c3c, 0x0072,
-   0x80728472, 0xc0211c7c,
+   0xc0211c7c, 0x0072,
+   0x80728472, 0xc0211bbc,
0x0072, 0x80728472,
-   0xc0211bbc, 0x0072,
-   0x80728472, 0xc0211bfc,
+   0xc0211bfc, 0x0072,
+   0x80728472, 0xc0211d3c,
0x0072, 0x80728472,
-   0xc0211d3c, 0x0072,
-   0x80728472, 0xc0211d7c,
+   0xc0211d7c, 0x0072,
+   0x80728472, 0xc0211a3c,
0x0072, 0x80728472,
-   0xc0211a3c, 0x0072,
-   0x80728472, 0xc0211a7c,
+   0xc0211a7c, 0x0072,
+   0x80728472, 0xc0211dfc,
0x0072, 0x80728472,
-   0xc0211dfc, 0x0072,
-   0x80728472, 0xc0211b3c,
+   0xc0211b3c, 0x0072,
+   0x80728472, 0xc0211b7c,
0x0072, 0x80728472,
-   0xc0211b7c, 0x0072,
-   0x80728472, 0xbf8c007f,
-   0xbefc0073, 0xbefe006e,
-   0xbeff006f, 0x867375ff,
-   0x03ff, 0xb9734803,
-   0x867375ff, 0xf800,
-   0x8f738b73, 0xb973a2c3,
-   0xb977f801, 0x8673ff71,
-   0xf000, 0x8f739c73,
-   0x8e739073, 0xbef60080,
-   0x87767376, 0x8673ff71,
-   0x0800, 0x8f739b73,
-   0x8e738f73, 0x87767376,
-   0x8673ff74, 0x0080,
-   0x8f739773, 0xb976f807,
-   0x8671ff71, 0x,
-   0x86fe7e7e, 0x86ea6a6a,
-   0x8f768374, 0xb976e0c2,
-   0xbf82, 0xb9740002,
-   0xbf8a, 0x95807370,
-   0xbf81, 0x,
+   0xbf8c007f, 0xbefc0073,
+   0xbefe006e, 0xbeff006f,
+   0x867375ff, 0x03ff,
+   0xb9734803, 0x867375ff,
+   0xf800, 0x8f738b73,
+   0xb973a2c3, 0xb977f801,
+   0x8673ff71, 0xf000,
+   0x8f739c73, 0x8e739073,
+   0xbef60080, 0x87767376,
+   0x8673ff71, 0x0800,
+   0x8f739b73, 0x8e738f73,
+   0x87767376, 0x8673ff74,
+   0x0080, 0x8f739773,
+   0xb976f807, 0x8671ff71,
+   0x, 0x86fe7e7e,
+   0x86ea6a6a, 0x8f768374,
+   0xb976e0c2, 0xbf82,
+   0xb9740002, 0xbf8a,
+   0x95807370, 0xbf81,
 };
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820248,
-   0xb8f8f802, 0x89788678,
-   0xb8eef801, 0x866eff6e,
-   0x0800, 0xbf840003,
+   0xbf820001, 0xbf820254,
+   0xb8f8f802, 0x8978ff78,
+   0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
-   0xbf840016, 0xb8fbf803,
+   0xbf840009, 0x866eff6d

[PATCH 2/2] drm/amdkfd: Add gfx11 trap handler

2022-05-16 Thread Eric Huang
From: Jay Cornwall 

Based on gfx10 with following changes:

- GPR_ALLOC.VGPR_SIZE field moved (and size corrected in gfx10)
- s_sendmsg_rtn_b64 replaces some s_sendmsg/s_getreg
- Buffer instructions no longer have direct-to-LDS modifier

Signed-off-by: Jay Cornwall 
Reviewed-by: Laurent Morichetti 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 463 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  69 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   6 +-
 3 files changed, 507 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 8cbdc7f519c6..60a81649cf12 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -776,7 +776,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xe0704100, 0x705d0100,
0xe0704200, 0x705d0200,
0xe0704300, 0x705d0300,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -855,7 +855,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x877aff6d, 0x8000,
0xbf840040, 0x8f7b867b,
0x8f7b827b, 0xbef6037b,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -891,7 +891,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbef003ff, 0x0200,
0xbeff0380, 0xbf820003,
0xbef003ff, 0x0400,
-   0xbeff03c1, 0xb97b2a05,
+   0xbeff03c1, 0xb97b3a05,
0x807b817b, 0x8f7b827b,
0x907c9973, 0x877c817c,
0xbf06817c, 0xbf850017,
@@ -939,7 +939,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb96f4306, 0x876fc16f,
0xbf840029, 0x8f6f866f,
0x8f6f826f, 0xbef6036f,
-   0xb9782a05, 0x80788178,
+   0xb9783a05, 0x80788178,
0xbf0d9972, 0xbf850002,
0x8f788978, 0xbf820001,
0x8f788a78, 0xb96e1e06,
@@ -962,7 +962,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x907c9972, 0x877c817c,
0xbf06817c, 0xbf850002,
0xbeff0380, 0xbf820001,
-   0xbeff03c1, 0xb96f2a05,
+   0xbeff03c1, 0xb96f3a05,
0x806f816f, 0x8f6f826f,
0x907c9972, 0x877c817c,
0xbf06817c, 0xbf850024,
@@ -1010,7 +1010,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x6e5d0100, 0xe0304200,
0x6e5d0200, 0xe0304300,
0x6e5d0300, 0xbf8c3f70,
-   0xb9782a05, 0x80788178,
+   0xb9783a05, 0x80788178,
0xbf0d9972, 0xbf850002,
0x8f788978, 0xbf820001,
0x8f788a78, 0xb96e1e06,
@@ -1037,7 +1037,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbe8c310c, 0xbe8e310e,
0xbf06807c, 0xbf84fff0,
0xba80f801, 0x,
-   0xbf8a, 0xb9782a05,
+   0xbf8a, 0xb9783a05,
0x80788178, 0xbf0d9972,
0xbf850002, 0x8f788978,
0xbf820001, 0x8f788a78,
@@ -2261,7 +2261,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf8a, 0x877aff6d,
0x8000, 0xbf840040,
0x8f7b867b, 0x8f7b827b,
-   0xbef6037b, 0xb9702a05,
+   0xbef6037b, 0xb9703a05,
0x80708170, 0xbf0d9973,
0xbf850002, 0x8f708970,
0xbf820001, 0x8f708a70,
@@ -2298,7 +2298,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x0200, 0xbeff0380,
0xbf820003, 0xbef003ff,
0x0400, 0xbeff03c1,
-   0xb97b2a05, 0x807b817b,
+   0xb97b3a05, 0x807b817b,
0x8f7b827b, 0x907c9973,
0x877c817c, 0xbf06817c,
0xbf850017, 0xbef603ff,
@@ -2345,7 +2345,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbeff03c1, 0xb96f4306,
0x876fc16f, 0xbf840029,
0x8f6f866f, 0x8f6f826f,
-   0xbef6036f, 0xb9782a05,
+   0xbef6036f, 0xb9783a05,
0x80788178, 0xbf0d9972,
0xbf850002, 0x8f788978,
0xbf820001, 0x8f788a78,
@@ -2369,7 +2369,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x877c817c, 0xbf06817c,
0xbf850002, 0xbeff0380,
0xbf820001, 0xbeff03c1,
-   0xb96f2a05, 0x806f816f,
+   0xb96f3a05, 0x806f816f,
0x8f6f826f, 0x907c9972,
0x877c817c, 0xbf06817c,
0xbf850024, 0xbef603ff,
@@ -2416,7 +2416,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xe0304100, 0x6e5d0100,
0xe0304200, 0x6e5d0200,
0xe0304300, 0x6e5d0300,
-   0xbf8c3f70, 0xb9782a05,
+   0xbf8c3f70, 0xb9783a05,
0x80788178, 0xbf0d9972,
0xbf850002, 0x8f788978,
0xbf820001, 0x8f788a78,
@@ -2444,7 +2444,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbe8e310e, 0xbf06807c,
0xbf84fff0, 0xba80f801,
0x, 0xbf8a,
-   0xb9782a05, 0x80788178,
+   0xb9783a05, 0x80788178,
0xbf0d9972, 0xbf850002,
0x8f788978, 0xbf820001,

Re: [PATCH] drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too

2022-04-14 Thread Eric Huang




On 2022-04-14 04:19, Lang Yu wrote:

The idea is from commit a50fe7078035 ("drm/amdkfd: Only apply heavy-weight
TLB flush on Aldebaran") and commit f61c40c0757a ("drm/amdkfd: enable
heavy-weight TLB flush on Arcturus"). Otherwise, we will run into problems
on some ASICs when running SVM applications.

Signed-off-by: Lang Yu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 8 
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +++-
  3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 91f82a9ccdaf..459f59e3d0ed 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1128,14 +1128,6 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 
*filep,
return ret;
  }
  
-static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev)

-{
-   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
-   (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
-   dev->adev->sdma.instance[0].fw_version >= 18) ||
-   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0);
-}
-
  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
struct kfd_process *p, void *data)
  {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 8a43def1f638..aff6f598ff2c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1328,6 +1328,14 @@ void kfd_signal_poison_consumed_event(struct kfd_dev 
*dev, u32 pasid);
  
  void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type);
  
+static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev)

+{
+   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+  (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+  dev->adev->sdma.instance[0].fw_version >= 18) ||
+  KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0);
+}
+
It is a cosmetic change for function kfd_flush_tlb_after_unmap, and not 
related to the topic. You can separate that into another patch.


Regards,
Eric

  bool kfd_is_locked(void);
  
  /* Compute profile */

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 459fa07a3bcc..5afe216cf099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1229,7 +1229,9 @@ svm_range_unmap_from_gpus(struct svm_range *prange, 
unsigned long start,
if (r)
break;
}
-   kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT);
+
+   if (kfd_flush_tlb_after_unmap(pdd->dev))
+   kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT);
}
  
  	return r;




Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Vega20

2022-02-07 Thread Eric Huang

Hi Guchun,

SDMA FW team confirms MI50/VG20 doesn't have the same bug as MI100, 
which cases asic hang issue when running RVS test. If this change makes 
KFDMemoryTest failed, please fill a Jira and assign to me.


Thanks,
Eric

On 2022-02-07 08:01, Chen, Guchun wrote:

[Public]

Hi Eric,

Are you sure that there is no FW requirement for this patch on Vega20? 
KFDMemory test failed by this commit.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Tuesday, January 25, 2022 4:08 AM
To: Huang, JinHuiEric 
Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Vega20

On Fri, Jan 21, 2022 at 11:17 AM Eric Huang  wrote:

It is to meet the requirement for memory allocation optimization on
MI50.

Signed-off-by: Eric Huang 

Assuming there is no firmware version requirement, the patch is:
Acked-by: Alex Deucher 


---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 5b8ae0795c0a..d708f1a502cf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1582,7 +1582,8 @@ static int kfd_ioctl_free_memory_of_gpu(struct
file *filep,  static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
 return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
(KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
-   dev->adev->sdma.instance[0].fw_version >= 18);
+   dev->adev->sdma.instance[0].fw_version >= 18) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0);
  }

  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
--
2.25.1





[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Vega20

2022-01-21 Thread Eric Huang
It is to meet the requirement for memory allocation
optimization on MI50.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 5b8ae0795c0a..d708f1a502cf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1582,7 +1582,8 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 
*filep,
 static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
   (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
-   dev->adev->sdma.instance[0].fw_version >= 18);
+   dev->adev->sdma.instance[0].fw_version >= 18) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0);
 }
 
 static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
-- 
2.25.1



Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

2022-01-19 Thread Eric Huang




On 2022-01-19 09:50, Russell, Kent wrote:

[AMD Official Use Only]


-Original Message-
From: Kuehling, Felix 
Sent: Tuesday, January 18, 2022 7:16 PM
To: Russell, Kent ; Huang, JinHuiEric
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent:

One question inline


*KENT RUSSELL***

Sr. Software Engineer | Linux Compute Kernel

1 Commerce Valley Drive East

Markham, ON L3T 7X6

*O*+(1) 289-695-2122**| Ext 72122



*From:* amd-gfx  on behalf of
Felix Kuehling 
*Sent:* Tuesday, January 18, 2022 6:36 PM
*To:* Huang, JinHuiEric ;
amd-gfx@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on
Arcturus

Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang 

Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 --
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 --
  2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index a64cbbd943ba..acb4fd973e60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
true);
ret = unreserve_bo_and_vms(, false, false);

- /* Only apply no TLB flush on Aldebaran to
-  * workaround regressions on other Asics.
-  */
- if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
- *table_freed = true;
-
goto out;

  out_unreserve:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index b570c0454ce9..485d4c52c7de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1596,6 +1596,12 @@ static int

kfd_ioctl_free_memory_of_gpu(struct file *filep,

return ret;
  }

+static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
+ return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)

Do we need to add a check for sdma ver >=8 here?

What's the significance of version 8 for Aldebaran? This code was
working on Aldebaran without a version check before. Did we ever
publicly release an SDMA firmware older than version 8 that for Aldebaran?

We released v5 for Aldebaran SDMA in ROCm 4.5 . If I remember the ticket 
correctly, the same fix for Arcturus was required for Aldebaran and was part of 
SDMA v8. But Eric is obviously watching the ticket more closely than I, so I'll 
defer to him there.
Yes. Aldebaran has the same bug as Arcturus in SDMA, but the bug doesn't 
cause GPU hang on Aldebaran. As Felix said heavy-weight TLB flush have 
been working on Aldebaran since it was enabled, so we don't need to 
check the version for it.


Regards,
Eric


  Kent


Regards,
   Felix



||

+(KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+ dev->adev->sdma.instance[0].fw_version >= 18);
+}
+
  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
struct kfd_process *p, void

*data)

  {
@@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct

file *filep,

}

/* Flush TLBs after waiting for the page table updates to

complete */

- if (table_freed) {
+ if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
for (i = 0; i < args->n_devices; i++) {
peer = kfd_device_by_id(devices_arr[i]);
if (WARN_ON_ONCE(!peer))
@@ -1806,7 +1812,7 @@ static int

kfd_ioctl_unmap_memory_from_gpu(struct file *filep,

}
mutex_unlock(>mutex);

- if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+ if (kfd_flush_tlb_after_unmap(dev)) {
err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
(struct kgd_mem *) mem, true);
if (err) {




[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

2022-01-18 Thread Eric Huang
SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 --
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 --
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..acb4fd973e60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
true);
ret = unreserve_bo_and_vms(, false, false);
 
-   /* Only apply no TLB flush on Aldebaran to
-* workaround regressions on other Asics.
-*/
-   if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
-   *table_freed = true;
-
goto out;
 
 out_unreserve:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..485d4c52c7de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1596,6 +1596,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 
*filep,
return ret;
 }
 
+static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
+   return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+  (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+   dev->adev->sdma.instance[0].fw_version >= 18);
+}
+
 static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
struct kfd_process *p, void *data)
 {
@@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
}
 
/* Flush TLBs after waiting for the page table updates to complete */
-   if (table_freed) {
+   if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
for (i = 0; i < args->n_devices; i++) {
peer = kfd_device_by_id(devices_arr[i]);
if (WARN_ON_ONCE(!peer))
@@ -1806,7 +1812,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
}
mutex_unlock(>mutex);
 
-   if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+   if (kfd_flush_tlb_after_unmap(dev)) {
err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
(struct kgd_mem *) mem, true);
if (err) {
-- 
2.25.1



[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

2022-01-18 Thread Eric Huang
SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +++-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..f1fed0fc31d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,10 +1892,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
true);
ret = unreserve_bo_and_vms(, false, false);
 
-   /* Only apply no TLB flush on Aldebaran to
-* workaround regressions on other Asics.
+   /* Only apply no TLB flush on Aldebaran and Arcturus
+* to workaround regressions on other Asics.
 */
-   if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
+   if (table_freed &&
+   (adev->asic_type != CHIP_ALDEBARAN) &&
+   (adev->asic_type != CHIP_ARCTURUS ||
+adev->sdma.instance[0].fw_version < 18))
*table_freed = true;
 
goto out;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..0e4a76dca809 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1806,7 +1806,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
}
mutex_unlock(>mutex);
 
-   if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+   if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+   (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+dev->adev->sdma.instance[0].fw_version >= 18)) {
err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
(struct kgd_mem *) mem, true);
if (err) {
-- 
2.25.1



Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

2022-01-18 Thread Eric Huang
I understand Alex's concern. I think usually we only check the version 
when a feature is only available in a specific version, and other 
version or newer version doesn't have.


In case of FW fix, we assume the driver and FWs have to be synchronous. 
If we have driver backward compatibility for FWs, there must be a lot of 
redundant codes for FW version check. So this patch and SDMA fix will be 
pushed into ROCm 5.1 release branch at the same time.


Regards,
Eric

On 2022-01-18 14:35, Alex Deucher wrote:

On Tue, Jan 18, 2022 at 2:27 PM Russell, Kent  wrote:

[AMD Official Use Only]

I think what he means is that if we are using SDMA v17, this will cause issues, 
won't it? Should we check that SDMA version is >=18 before enabling it? Or am I 
misunderstanding the fix?

Yes, that was my concern.

Alex


  Kent


-Original Message-
From: amd-gfx  On Behalf Of Eric Huang
Sent: Tuesday, January 18, 2022 2:09 PM
To: Alex Deucher 
Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

The SDMA fix is generic and not in a specific version of FW, so we don't
have to check.

Thanks,
Eric

On 2022-01-18 11:35, Alex Deucher wrote:

On Tue, Jan 18, 2022 at 11:16 AM Eric Huang  wrote:

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Do we need to check for a specific fw version?

Alex


Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +---
   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++-
   2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index a64cbbd943ba..7b24a920c12e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
  true);
  ret = unreserve_bo_and_vms(, false, false);

-   /* Only apply no TLB flush on Aldebaran to
-* workaround regressions on other Asics.
+   /* Only apply no TLB flush on Aldebaran and Arcturus
+* to workaround regressions on other Asics.
   */
-   if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
+   if (table_freed &&
+   (adev->asic_type != CHIP_ALDEBARAN) &&
+   (adev->asic_type != CHIP_ARCTURUS))
  *table_freed = true;

  goto out;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index b570c0454ce9..ef4d676cc71c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file

*filep,

  }
  mutex_unlock(>mutex);

-   if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+   if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+   KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
  err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
  (struct kgd_mem *) mem, true);
  if (err) {
--
2.25.1





  1   2   3   >