Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access

2021-04-06 Thread Qu Huang

Hi Christian,

On 2021/4/3 16:49, Christian König wrote:

Hi Qu,

Am 03.04.21 um 07:08 schrieb Qu Huang:

Hi Christian,

On 2021/4/3 0:25, Christian König wrote:

Hi Qu,

Am 02.04.21 um 05:18 schrieb Qu Huang:

Before dma_resv_lock(bo->base.resv, NULL) in
amdgpu_bo_release_notify(),
the bo->base.resv lock may be held by ttm_mem_evict_first(),


That can't happen since when bo_release_notify is called the BO has not
more references and is therefore deleted.

And we never evict a deleted BO, we just wait for it to become idle.


Yes, the bo reference counter return to zero will enter
ttm_bo_release(),but notify bo release (call amdgpu_bo_release_notify())
first happen, and then test if a reservation object's fences have been
signaled, and then mark bo as deleted and remove bo from the LRU list.

When ttm_bo_release() and ttm_mem_evict_first() is concurrent,
the Bo has not been removed from the LRU list and is not marked as
deleted, this will happen.


Not sure on which code base you are, but I don't see how this can happen.

ttm_mem_evict_first() calls ttm_bo_get_unless_zero() and
ttm_bo_release() is only called when the BO reference count becomes zero.

So ttm_mem_evict_first() will see that this BO is about to be destroyed
and skips it.



Yes, you are right. My version of TTM is ROCM 3.3, so
ttm_mem_evict_first() did not call ttm_bo_get_unless_zero(), check that
ROCM 4.0 ttm doesn't have this issue. This is an oversight on my part.



As a test, when we use CPU memset instead of SDMA fill in
amdgpu_bo_release_notify(), the result is page fault:

PID: 5490   TASK: 8e8136e04100  CPU: 4   COMMAND: "gemmPerf"
  #0 [8e79eaa17970] machine_kexec at b2863784
  #1 [8e79eaa179d0] __crash_kexec at b291ce92
  #2 [8e79eaa17aa0] crash_kexec at b291cf80
  #3 [8e79eaa17ab8] oops_end at b2f6c768
  #4 [8e79eaa17ae0] no_context at b2f5aaa6
  #5 [8e79eaa17b30] __bad_area_nosemaphore at b2f5ab3d
  #6 [8e79eaa17b80] bad_area_nosemaphore at b2f5acae
  #7 [8e79eaa17b90] __do_page_fault at b2f6f6c0
  #8 [8e79eaa17c00] do_page_fault at b2f6f925
  #9 [8e79eaa17c30] page_fault at b2f6b758
 [exception RIP: memset+31]
 RIP: b2b8668f  RSP: 8e79eaa17ce8  RFLAGS: 00010a17
 RAX: bebebebebebebebe  RBX: 8e747bff10c0  RCX: 060b0020
 RDX:   RSI: 00be  RDI: ab807f00
 RBP: 8e79eaa17d10   R8: 8e79eaa14000   R9: ab7c8000
 R10: bcba  R11: 01ba  R12: 8e79ebaa4050
 R13: ab7c8000  R14: 00022600  R15: 8e8136e04100
 ORIG_RAX:   CS: 0010  SS: 0018
#10 [8e79eaa17ce8] amdgpu_bo_release_notify at c092f2d1
[amdgpu]
#11 [8e79eaa17d18] ttm_bo_release at c08f39dd [amdttm]
#12 [8e79eaa17d58] amdttm_bo_put at c08f3c8c [amdttm]
#13 [8e79eaa17d68] amdttm_bo_vm_close at c08f7ac9 [amdttm]
#14 [8e79eaa17d80] remove_vma at b29ef115
#15 [8e79eaa17da0] exit_mmap at b29f2c64
#16 [8e79eaa17e58] mmput at b28940c7
#17 [8e79eaa17e78] do_exit at b289dc95
#18 [8e79eaa17f10] do_group_exit at b289e4cf
#19 [8e79eaa17f40] sys_exit_group at b289e544
#20 [8e79eaa17f50] system_call_fastpath at b2f74ddb


Well that might be perfectly expected. VRAM is not necessarily CPU
accessible.


As a test,use CPU memset instead of SDMA fill, This is my code:
void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
{
struct amdgpu_bo *abo;
uint64_t num_pages;
struct drm_mm_node *mm_node;
struct amdgpu_device *adev;
void __iomem *kaddr;

if (!amdgpu_bo_is_amdgpu_bo(bo))
return;

abo = ttm_to_amdgpu_bo(bo);
num_pages = abo->tbo.num_pages;
mm_node = abo->tbo.mem.mm_node;
adev = amdgpu_ttm_adev(abo->tbo.bdev);
kaddr = adev->mman.aper_base_kaddr;

if (abo->kfd_bo)
amdgpu_amdkfd_unreserve_memory_limit(abo);

if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
return;

dma_resv_lock(amdkcl_ttm_resvp(bo), NULL);
while (num_pages && mm_node) {
void *ptr = kaddr + (mm_node->start << PAGE_SHIFT);
memset_io(ptr, AMDGPU_POISON & 0xff, mm_node->size 
<size;
++mm_node;
}
dma_resv_unlock(amdkcl_ttm_resvp(bo));
}





I have used the old version through oversight, so I am sorry for your
trouble.


Regards,
Qu.


Regards,
Christian.



Regards,
Qu.



Regards,
Christian.


and the VRAM mem will be evicted, mem region was replaced
by Gtt mem region. amdgpu_bo_release_notify() will then
hold the bo->base.resv loc

Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access

2021-04-02 Thread Qu Huang

Hi Christian,

On 2021/4/3 0:25, Christian König wrote:

Hi Qu,

Am 02.04.21 um 05:18 schrieb Qu Huang:

Before dma_resv_lock(bo->base.resv, NULL) in amdgpu_bo_release_notify(),
the bo->base.resv lock may be held by ttm_mem_evict_first(),


That can't happen since when bo_release_notify is called the BO has not
more references and is therefore deleted.

And we never evict a deleted BO, we just wait for it to become idle.


Yes, the bo reference counter return to zero will enter
ttm_bo_release(),but notify bo release (call amdgpu_bo_release_notify())
first happen, and then test if a reservation object's fences have been
signaled, and then mark bo as deleted and remove bo from the LRU list.

When ttm_bo_release() and ttm_mem_evict_first() is concurrent,
the Bo has not been removed from the LRU list and is not marked as
deleted, this will happen.

As a test, when we use CPU memset instead of SDMA fill in
amdgpu_bo_release_notify(), the result is page fault:

PID: 5490   TASK: 8e8136e04100  CPU: 4   COMMAND: "gemmPerf"
  #0 [8e79eaa17970] machine_kexec at b2863784
  #1 [8e79eaa179d0] __crash_kexec at b291ce92
  #2 [8e79eaa17aa0] crash_kexec at b291cf80
  #3 [8e79eaa17ab8] oops_end at b2f6c768
  #4 [8e79eaa17ae0] no_context at b2f5aaa6
  #5 [8e79eaa17b30] __bad_area_nosemaphore at b2f5ab3d
  #6 [8e79eaa17b80] bad_area_nosemaphore at b2f5acae
  #7 [8e79eaa17b90] __do_page_fault at b2f6f6c0
  #8 [8e79eaa17c00] do_page_fault at b2f6f925
  #9 [8e79eaa17c30] page_fault at b2f6b758
 [exception RIP: memset+31]
 RIP: b2b8668f  RSP: 8e79eaa17ce8  RFLAGS: 00010a17
 RAX: bebebebebebebebe  RBX: 8e747bff10c0  RCX: 060b0020
 RDX:   RSI: 00be  RDI: ab807f00
 RBP: 8e79eaa17d10   R8: 8e79eaa14000   R9: ab7c8000
 R10: bcba  R11: 01ba  R12: 8e79ebaa4050
 R13: ab7c8000  R14: 00022600  R15: 8e8136e04100
 ORIG_RAX:   CS: 0010  SS: 0018
#10 [8e79eaa17ce8] amdgpu_bo_release_notify at c092f2d1 [amdgpu]
#11 [8e79eaa17d18] ttm_bo_release at c08f39dd [amdttm]
#12 [8e79eaa17d58] amdttm_bo_put at c08f3c8c [amdttm]
#13 [8e79eaa17d68] amdttm_bo_vm_close at c08f7ac9 [amdttm]
#14 [8e79eaa17d80] remove_vma at b29ef115
#15 [8e79eaa17da0] exit_mmap at b29f2c64
#16 [8e79eaa17e58] mmput at b28940c7
#17 [8e79eaa17e78] do_exit at b289dc95
#18 [8e79eaa17f10] do_group_exit at b289e4cf
#19 [8e79eaa17f40] sys_exit_group at b289e544
#20 [8e79eaa17f50] system_call_fastpath at b2f74ddb

Regards,
Qu.



Regards,
Christian.


and the VRAM mem will be evicted, mem region was replaced
by Gtt mem region. amdgpu_bo_release_notify() will then
hold the bo->base.resv lock, and SDMA will get an invalid
address in amdgpu_fill_buffer(), resulting in a VMFAULT
or memory corruption.

To avoid it, we have to hold bo->base.resv lock first, and
check whether the mem.mem_type is TTM_PL_VRAM.

Signed-off-by: Qu Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4b29b82..8018574 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1300,12 +1300,16 @@ void amdgpu_bo_release_notify(struct
ttm_buffer_object *bo)
  if (bo->base.resv == >base._resv)
  amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);

-    if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
-    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+    if (!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
  return;

  dma_resv_lock(bo->base.resv, NULL);

+    if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node) {
+    dma_resv_unlock(bo->base.resv);
+    return;
+    }
+
  r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, );
  if (!WARN_ON(r)) {
  amdgpu_bo_fence(abo, fence, false);
--
1.8.3.1





[PATCH] drm/amdgpu: Fix a potential sdma invalid access

2021-04-01 Thread Qu Huang
Before dma_resv_lock(bo->base.resv, NULL) in amdgpu_bo_release_notify(),
the bo->base.resv lock may be held by ttm_mem_evict_first(),
and the VRAM mem will be evicted, mem region was replaced
by Gtt mem region. amdgpu_bo_release_notify() will then
hold the bo->base.resv lock, and SDMA will get an invalid
address in amdgpu_fill_buffer(), resulting in a VMFAULT
or memory corruption.

To avoid it, we have to hold bo->base.resv lock first, and
check whether the mem.mem_type is TTM_PL_VRAM.

Signed-off-by: Qu Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4b29b82..8018574 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1300,12 +1300,16 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object 
*bo)
if (bo->base.resv == >base._resv)
amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);

-   if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
-   !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+   if (!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
return;

dma_resv_lock(bo->base.resv, NULL);

+   if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node) {
+   dma_resv_unlock(bo->base.resv);
+   return;
+   }
+
r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, );
if (!WARN_ON(r)) {
amdgpu_bo_fence(abo, fence, false);
--
1.8.3.1



Re: [PATCH] drm/amdkfd: dqm fence memory corruption

2021-03-26 Thread Qu Huang

On 2021/1/28 5:50, Felix Kuehling wrote:

Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.


Thank you for pointing out that discrepancy. That's a good catch!

I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
We should probably also fix up the query_status and
amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
values everywhere to be consistent.

Regards,
   Felix
Hi Felix, Thanks for your advice, please check v2 at 
https://lore.kernel.org/patchwork/patch/1372584/

Thanks,
Qu.





Signed-off-by: Qu Huang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
pr_debug("Allocating fence memory\n");
  
  	/* allocate fence memory on the gart */

-   retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+   retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>fence_mem);
  
  	if (retval)




[PATCH] drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug

2021-03-21 Thread Qu Huang
Here is the system crash log:
[ 1272.884438] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 1272.88] IP: [<  (null)>]   (null)
[ 1272.884447] PGD 825b09067 PUD 8267c8067 PMD 0
[ 1272.884452] Oops: 0010 [#1] SMP
[ 1272.884509] CPU: 13 PID: 3485 Comm: cat Kdump: loaded Tainted: G
[ 1272.884515] task: 9a38dbd4d140 ti: 9a37cd3b8000 task.ti:
9a37cd3b8000
[ 1272.884517] RIP: 0010:[<>]  [<  (null)>]
(null)
[ 1272.884520] RSP: 0018:9a37cd3bbe68  EFLAGS: 00010203
[ 1272.884522] RAX:  RBX:  RCX:
00014d5f
[ 1272.884524] RDX: fff4 RSI: 0001 RDI:
9a38aca4d200
[ 1272.884526] RBP: 9a37cd3bbed0 R08: 9a38dcd5f1a0 R09:
9a31ffc07300
[ 1272.884527] R10: 9a31ffc07300 R11: addd5e9d R12:
9a38b4e0fb00
[ 1272.884529] R13: 0001 R14: 9a37cd3bbf18 R15:
9a38aca4d200
[ 1272.884532] FS:  7feccaa67740() GS:9a38dcd4()
knlGS:
[ 1272.884534] CS:  0010 DS:  ES:  CR0: 80050033
[ 1272.884536] CR2:  CR3: 0008267c CR4:
003407e0
[ 1272.884537] Call Trace:
[ 1272.884544]  [] ? seq_read+0x130/0x440
[ 1272.884548]  [] vfs_read+0x9f/0x170
[ 1272.884552]  [] SyS_read+0x7f/0xf0
[ 1272.884557]  [] system_call_fastpath+0x22/0x27
[ 1272.884558] Code:  Bad RIP value.
[ 1272.884562] RIP  [<  (null)>]   (null)
[ 1272.884564]  RSP 
[ 1272.884566] CR2: 0000

Signed-off-by: Qu Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
index 511712c..673d5e3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
@@ -33,6 +33,11 @@ static int kfd_debugfs_open(struct inode *inode, struct file 
*file)

return single_open(file, show, NULL);
 }
+static int kfd_debugfs_hang_hws_read(struct seq_file *m, void *data)
+{
+   seq_printf(m, "echo gpu_id > hang_hws\n");
+   return 0;
+}

 static ssize_t kfd_debugfs_hang_hws_write(struct file *file,
const char __user *user_buf, size_t size, loff_t *ppos)
@@ -94,7 +99,7 @@ void kfd_debugfs_init(void)
debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
kfd_debugfs_rls_by_device, _debugfs_fops);
debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
-   NULL, _debugfs_hang_hws_fops);
+   kfd_debugfs_hang_hws_read, 
_debugfs_hang_hws_fops);
 }

 void kfd_debugfs_fini(void)
--
1.8.3.1



[PATCH v2] drm/amdkfd: dqm fence memory corruption

2021-01-28 Thread Qu Huang
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Changes since v1:
  * Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.

Signed-off-by: Qu Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c| 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c| 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 8 
 7 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index b258a3d..159add0f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -155,7 +155,7 @@ static int dbgdev_diq_submit_ib(struct kfd_dbgdev *dbgdev,

/* Wait till CP writes sync code: */
status = amdkfd_fence_wait_timeout(
-   (unsigned int *) rm_state,
+   rm_state,
QUEUESTATE__ACTIVE, 1500);

kfd_gtt_sa_free(dbgdev->dev, mem_obj);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..4598a9a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1167,7 +1167,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
if (retval)
goto fail_allocate_vidmem;

-   dqm->fence_addr = dqm->fence_mem->cpu_ptr;
+   dqm->fence_addr = (uint64_t *)dqm->fence_mem->cpu_ptr;
dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr;

init_interrupts(dqm);
@@ -1340,8 +1340,8 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
return retval;
 }

-int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
-   unsigned int fence_value,
+int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
+   uint64_t fence_value,
unsigned int timeout_ms)
 {
unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 16262e5..16b23dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -192,7 +192,7 @@ struct device_queue_manager {
uint16_tvmid_pasid[VMID_NUM];
uint64_tpipelines_addr;
uint64_tfence_gpu_addr;
-   unsigned int*fence_addr;
+   uint64_t*fence_addr;
struct kfd_mem_obj  *fence_mem;
boolactive_runlist;
int sched_policy;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 5d541e0..f71a7fa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -347,7 +347,7 @@ int pm_send_runlist(struct packet_manager *pm, struct 
list_head *dqm_queues)
 }

 int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,
-   uint32_t fence_value)
+   uint64_t fence_value)
 {
uint32_t *buffer, size;
int retval = 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index dfaf771..e3ba0cd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -283,7 +283,7 @@ static int pm_unmap_queues_v9(struct packet_manager *pm, 
uint32_t *buffer,
 }

 static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer,
-   uint64_t fence_address, uint32_t fence_value)
+   uint64_t fence_address, uint64_t fence_value)
 {
struct pm4_mes_query_status *packet;

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c
index a852e0d..08442e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi

[PATCH] drm/amdkfd: dqm fence memory corruption

2021-01-27 Thread Qu Huang
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Signed-off-by: Qu Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
pr_debug("Allocating fence memory\n");
 
/* allocate fence memory on the gart */
-   retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+   retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>fence_mem);
 
if (retval)
-- 
1.8.3.1