Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-28 Thread Felix Kuehling



On 2024-02-28 01:41, Christian König wrote:

Am 28.02.24 um 06:04 schrieb Jesse.Zhang:

fix the issue when run clinfo:
"amdgpu: Failed to create process VM object".

when amdgpu initialized, seq64 do mampping and update bo mapping in 
vm page table.
But when clifo run. It also initializes a vm for a process device 
through the function kfd_process_device_init_vm
and ensure the root PD is clean through the function 
amdgpu_vm_pt_is_root_clean.

So they have a conflict, and clinfo  always failed.


Big NAK for this, you removed the check but didn't solved the problem 
in any way.


When Raven still needs the ats feature than it is intentional that 
this fails.


I agree. I think we should just remove all the pte_supports_ats stuff 
from the amdgpu_vm code. We no longer use IOMMUv2. So there is no point 
setting invalid PTEs to fail over to ATS any more. As far as I can see, 
this will require changes in amdgpu_vm_clear_freed, amdgpu_vm_init, 
amdgpu_vm_make_compute. Then you can remove amdgpu_vm.pte_support_ats 
from the struct and remove amdgpu_vm_pt_is_root_clean.


Regards,
  Felix




Regards,
Christian.



Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
  1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index ed4a8c5d26d7..0bc0bc75be15 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2361,12 +2361,6 @@ int amdgpu_vm_make_compute(struct 
amdgpu_device *adev, struct amdgpu_vm *vm)

   * changing any other state, in case it fails.
   */
  if (pte_support_ats != vm->pte_support_ats) {
-    /* Sanity checks */
-    if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-    r = -EINVAL;
-    goto unreserve_bo;
-    }
-
  vm->pte_support_ats = pte_support_ats;
  r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
 false);




Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-27 Thread Christian König

Am 28.02.24 um 06:04 schrieb Jesse.Zhang:

fix the issue when run clinfo:
"amdgpu: Failed to create process VM object".

when amdgpu initialized, seq64 do mampping and update bo mapping in vm page 
table.
But when clifo run. It also initializes a vm for a process device through the 
function kfd_process_device_init_vm
and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.


Big NAK for this, you removed the check but didn't solved the problem in 
any way.


When Raven still needs the ats feature than it is intentional that this 
fails.


Regards,
Christian.



Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
  1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..0bc0bc75be15 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2361,12 +2361,6 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)
 * changing any other state, in case it fails.
 */
if (pte_support_ats != vm->pte_support_ats) {
-   /* Sanity checks */
-   if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-   r = -EINVAL;
-   goto unreserve_bo;
-   }
-
vm->pte_support_ats = pte_support_ats;
r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
   false);




[PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-27 Thread Jesse . Zhang
fix the issue when run clinfo:
"amdgpu: Failed to create process VM object".

when amdgpu initialized, seq64 do mampping and update bo mapping in vm page 
table.
But when clifo run. It also initializes a vm for a process device through the 
function kfd_process_device_init_vm
and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..0bc0bc75be15 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2361,12 +2361,6 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)
 * changing any other state, in case it fails.
 */
if (pte_support_ats != vm->pte_support_ats) {
-   /* Sanity checks */
-   if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-   r = -EINVAL;
-   goto unreserve_bo;
-   }
-
vm->pte_support_ats = pte_support_ats;
r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
   false);
-- 
2.34.1