Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Chen, Xiaogang



On 10/12/2023 10:20 AM, Felix Kuehling wrote:

On 2023-10-11 19:36, Xiaogang.Chen wrote:

From: Xiaogang Chen 


Since you are the author of this updated patch, you should also add 
your Signed-off-by below.

ok, thanks.





This is needed to correctly handle BOs imported into compute VM from 
gfx.
Both kfd and gfx should use same bo_va when map the Bos into same VM, 
otherwise
we may trigger kernel general protection when iterate mappings from 
bo_va.


Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
Reviewed-by: Ramesh Errabolu 
Reviewed-By: Xiaogang Chen 
Tested-By: Xiaogang Chen 


You lost any mention of reference counting from the commit headline 
and description. I think that's still an important concept that should 
be mentioned. I'd change the headline to something like this:


drm/amdgpu: Correctly use bo_va->ref_count in compute VMs


ok, will update commit headline and message and resend.

Regards

Xiaogang


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index a15e59abe70a..c1ec93cc50ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -832,6 +832,7 @@ static int kfd_mem_attach(struct amdgpu_device 
*adev, struct kgd_mem *mem,

  uint64_t va = mem->va;
  struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
  struct amdgpu_bo *bo[2] = {NULL, NULL};
+    struct amdgpu_bo_va *bo_va;
  bool same_hive = false;
  int i, ret;
  @@ -919,7 +920,13 @@ static int kfd_mem_attach(struct amdgpu_device 
*adev, struct kgd_mem *mem,

  pr_debug("Unable to reserve BO during memory attach");
  goto unwind;
  }
-    attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+    bo_va = amdgpu_vm_bo_find(vm, bo[i]);
+    if (!bo_va)
+    bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+    else
+    ++bo_va->ref_count;
+    attachment[i]->bo_va = bo_va;
+
  amdgpu_bo_unreserve(bo[i]);
  if (unlikely(!attachment[i]->bo_va)) {
  ret = -ENOMEM;
@@ -943,7 +950,8 @@ static int kfd_mem_attach(struct amdgpu_device 
*adev, struct kgd_mem *mem,

  continue;
  if (attachment[i]->bo_va) {
  amdgpu_bo_reserve(bo[i], true);
-    amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
+    if (--attachment[i]->bo_va->ref_count == 0)
+    amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
  amdgpu_bo_unreserve(bo[i]);
  list_del([i]->list);
  }


Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Felix Kuehling

On 2023-10-11 19:36, Xiaogang.Chen wrote:

From: Xiaogang Chen 


Since you are the author of this updated patch, you should also add your 
Signed-off-by below.





This is needed to correctly handle BOs imported into compute VM from gfx.
Both kfd and gfx should use same bo_va when map the Bos into same VM, otherwise
we may trigger kernel general protection when iterate mappings from bo_va.

Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
Reviewed-by: Ramesh Errabolu 
Reviewed-By: Xiaogang Chen 
Tested-By: Xiaogang Chen 


You lost any mention of reference counting from the commit headline and 
description. I think that's still an important concept that should be 
mentioned. I'd change the headline to something like this:


drm/amdgpu: Correctly use bo_va->ref_count in compute VMs

Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a15e59abe70a..c1ec93cc50ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -832,6 +832,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
uint64_t va = mem->va;
struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
struct amdgpu_bo *bo[2] = {NULL, NULL};
+   struct amdgpu_bo_va *bo_va;
bool same_hive = false;
int i, ret;
  
@@ -919,7 +920,13 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,

pr_debug("Unable to reserve BO during memory attach");
goto unwind;
}
-   attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   bo_va = amdgpu_vm_bo_find(vm, bo[i]);
+   if (!bo_va)
+   bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   else
+   ++bo_va->ref_count;
+   attachment[i]->bo_va = bo_va;
+
amdgpu_bo_unreserve(bo[i]);
if (unlikely(!attachment[i]->bo_va)) {
ret = -ENOMEM;
@@ -943,7 +950,8 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
continue;
if (attachment[i]->bo_va) {
amdgpu_bo_reserve(bo[i], true);
-   amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
+   if (--attachment[i]->bo_va->ref_count == 0)
+   amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
amdgpu_bo_unreserve(bo[i]);
list_del([i]->list);
}


Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Chen, Xiaogang



On 10/12/2023 2:35 AM, Christian König wrote:
The subject line somehow got messed up. There should be an drm/amdgpu: 
or drm/amdkfd: prefix.


yes, will resend it.

Regards

Xiaogang.



Regards,
Christian.

Am 12.10.23 um 01:36 schrieb Xiaogang.Chen:

From: Xiaogang Chen 

This is needed to correctly handle BOs imported into compute VM from 
gfx.
Both kfd and gfx should use same bo_va when map the Bos into same VM, 
otherwise
we may trigger kernel general protection when iterate mappings from 
bo_va.


Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
Reviewed-by: Ramesh Errabolu 
Reviewed-By: Xiaogang Chen 
Tested-By: Xiaogang Chen 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index a15e59abe70a..c1ec93cc50ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -832,6 +832,7 @@ static int kfd_mem_attach(struct amdgpu_device 
*adev, struct kgd_mem *mem,

  uint64_t va = mem->va;
  struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
  struct amdgpu_bo *bo[2] = {NULL, NULL};
+    struct amdgpu_bo_va *bo_va;
  bool same_hive = false;
  int i, ret;
  @@ -919,7 +920,13 @@ static int kfd_mem_attach(struct amdgpu_device 
*adev, struct kgd_mem *mem,

  pr_debug("Unable to reserve BO during memory attach");
  goto unwind;
  }
-    attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+    bo_va = amdgpu_vm_bo_find(vm, bo[i]);
+    if (!bo_va)
+    bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+    else
+    ++bo_va->ref_count;
+    attachment[i]->bo_va = bo_va;
+
  amdgpu_bo_unreserve(bo[i]);
  if (unlikely(!attachment[i]->bo_va)) {
  ret = -ENOMEM;
@@ -943,7 +950,8 @@ static int kfd_mem_attach(struct amdgpu_device 
*adev, struct kgd_mem *mem,

  continue;
  if (attachment[i]->bo_va) {
  amdgpu_bo_reserve(bo[i], true);
-    amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
+    if (--attachment[i]->bo_va->ref_count == 0)
+    amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
  amdgpu_bo_unreserve(bo[i]);
  list_del([i]->list);
  }




Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Christian König
The subject line somehow got messed up. There should be an drm/amdgpu: 
or drm/amdkfd: prefix.


Regards,
Christian.

Am 12.10.23 um 01:36 schrieb Xiaogang.Chen:

From: Xiaogang Chen 

This is needed to correctly handle BOs imported into compute VM from gfx.
Both kfd and gfx should use same bo_va when map the Bos into same VM, otherwise
we may trigger kernel general protection when iterate mappings from bo_va.

Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
Reviewed-by: Ramesh Errabolu 
Reviewed-By: Xiaogang Chen 
Tested-By: Xiaogang Chen 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a15e59abe70a..c1ec93cc50ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -832,6 +832,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
uint64_t va = mem->va;
struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
struct amdgpu_bo *bo[2] = {NULL, NULL};
+   struct amdgpu_bo_va *bo_va;
bool same_hive = false;
int i, ret;
  
@@ -919,7 +920,13 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,

pr_debug("Unable to reserve BO during memory attach");
goto unwind;
}
-   attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   bo_va = amdgpu_vm_bo_find(vm, bo[i]);
+   if (!bo_va)
+   bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   else
+   ++bo_va->ref_count;
+   attachment[i]->bo_va = bo_va;
+
amdgpu_bo_unreserve(bo[i]);
if (unlikely(!attachment[i]->bo_va)) {
ret = -ENOMEM;
@@ -943,7 +950,8 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
continue;
if (attachment[i]->bo_va) {
amdgpu_bo_reserve(bo[i], true);
-   amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
+   if (--attachment[i]->bo_va->ref_count == 0)
+   amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
amdgpu_bo_unreserve(bo[i]);
list_del([i]->list);
}




[PATCH] Find bo_va before create it when map bo into compute VM

2023-10-11 Thread Xiaogang . Chen
From: Xiaogang Chen 

This is needed to correctly handle BOs imported into compute VM from gfx.
Both kfd and gfx should use same bo_va when map the Bos into same VM, otherwise
we may trigger kernel general protection when iterate mappings from bo_va.

Signed-off-by: Felix Kuehling 
Acked-by: Christian König 
Reviewed-by: Ramesh Errabolu 
Reviewed-By: Xiaogang Chen 
Tested-By: Xiaogang Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a15e59abe70a..c1ec93cc50ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -832,6 +832,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
uint64_t va = mem->va;
struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
struct amdgpu_bo *bo[2] = {NULL, NULL};
+   struct amdgpu_bo_va *bo_va;
bool same_hive = false;
int i, ret;
 
@@ -919,7 +920,13 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
pr_debug("Unable to reserve BO during memory attach");
goto unwind;
}
-   attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   bo_va = amdgpu_vm_bo_find(vm, bo[i]);
+   if (!bo_va)
+   bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   else
+   ++bo_va->ref_count;
+   attachment[i]->bo_va = bo_va;
+
amdgpu_bo_unreserve(bo[i]);
if (unlikely(!attachment[i]->bo_va)) {
ret = -ENOMEM;
@@ -943,7 +950,8 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
continue;
if (attachment[i]->bo_va) {
amdgpu_bo_reserve(bo[i], true);
-   amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
+   if (--attachment[i]->bo_va->ref_count == 0)
+   amdgpu_vm_bo_del(adev, attachment[i]->bo_va);
amdgpu_bo_unreserve(bo[i]);
list_del([i]->list);
}
-- 
2.25.1