Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 04:53 PM, Christian König wrote:

Am 29.08.2018 um 04:52 schrieb Zhang, Jerry (Junwei):

On 08/28/2018 08:40 PM, Emily Deng wrote:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.


IMO, the evicted bo should be created previously.
On BO creation we will add it to the bo->va as below:

amdgpu_gem_create_ioctl
  drm_gem_handle_create
amdgpu_gem_object_open


And here is the problem. Between creating the BO and opening it in the client 
the BO can be evicted.


Thanks to explain that.

It's the key point, falling in Murphy's law as well.

Jerry



That's what Emily's patch is handling here.

Christian.


amdgpu_vm_bo_add
amdgpu_vm_bo_base_init
  list_add_tail(>bo_list, >va)

Then it could be set moved in bo invalidate when evicting.

could you provide a bit more backgroud about the issue?
looks a per vm bo is evicted and a new same bo created.

Jerry


3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
   * is validated on next vm use to avoid fault.
   * */
  list_move_tail(>vm_status, >evicted);
+base->moved = true;
  }

  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-29 Thread Christian König

Am 29.08.2018 um 04:52 schrieb Zhang, Jerry (Junwei):

On 08/28/2018 08:40 PM, Emily Deng wrote:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.


IMO, the evicted bo should be created previously.
On BO creation we will add it to the bo->va as below:

amdgpu_gem_create_ioctl
  drm_gem_handle_create
    amdgpu_gem_object_open


And here is the problem. Between creating the BO and opening it in the 
client the BO can be evicted.


That's what Emily's patch is handling here.

Christian.


amdgpu_vm_bo_add
    amdgpu_vm_bo_base_init
  list_add_tail(>bo_list, >va)

Then it could be set moved in bo invalidate when evicting.

could you provide a bit more backgroud about the issue?
looks a per vm bo is evicted and a new same bo created.

Jerry


3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

   * is validated on next vm use to avoid fault.
   * */
  list_move_tail(>vm_status, >evicted);
+    base->moved = true;
  }

  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-28 Thread Zhang, Jerry (Junwei)

On 08/28/2018 08:40 PM, Emily Deng wrote:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.


IMO, the evicted bo should be created previously.
On BO creation we will add it to the bo->va as below:

amdgpu_gem_create_ioctl
  drm_gem_handle_create
amdgpu_gem_object_open
  amdgpu_vm_bo_add
amdgpu_vm_bo_base_init
  list_add_tail(>bo_list, >va)

Then it could be set moved in bo invalidate when evicting.

could you provide a bit more backgroud about the issue?
looks a per vm bo is evicted and a new same bo created.

Jerry


3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
 * is validated on next vm use to avoid fault.
 * */
list_move_tail(>vm_status, >evicted);
+   base->moved = true;
  }

  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-28 Thread Deng, Emily
>-Original Message-
>From: amd-gfx  On Behalf Of
>zhoucm1
>Sent: Wednesday, August 29, 2018 10:12 AM
>To: Koenig, Christian ; Deng, Emily
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
>
>
>
>On 2018年08月28日 20:47, Christian König wrote:
>> Am 28.08.2018 um 14:40 schrieb Emily Deng:
>>> Fix the VMC page fault when the running sequence is as below:
>>> 1.amdgpu_gem_create_ioctl
>>> 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
>>> amdgpu_vm_bo_base_init, so won't called list_add_tail(>bo_list,
>>> >va). Even the bo was evicted, it won't set the bo_base->moved.
>>> 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
>>> list_move_tail(>vm_status, >evicted), but not set the
>>> bo_base->moved.
>>> 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base-
>>moved is
>>> not set true, the function amdgpu_vm_bo_insert_map will call
>>> list_move(_va->base.vm_status, >moved) 5.amdgpu_cs_ioctl won't
>>> validate the swapout bo, as it is only in the moved list, not in the
>>> evict list. So VMC page fault occurs.
>>>
>>> Signed-off-by: Emily Deng 
>>
>> Good catch, patch is Reviewed-by: Christian König
>> 
>Really good debug, Emily, you can add my Reviewed-by: Chunming Zhou
> as well if you still don't push it yet.
David, thanks you for helping me to find out the root cause, already pushed the 
patch.
>Regards,
>David Zhou
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 1f4b8df..015e20e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct
>>> amdgpu_vm_bo_base *base,
>>>    * is validated on next vm use to avoid fault.
>>>    * */
>>>   list_move_tail(>vm_status, >evicted);
>>> +    base->moved = true;
>>>   }
>>>     /**
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-28 Thread zhoucm1



On 2018年08月28日 20:47, Christian König wrote:

Am 28.08.2018 um 14:40 schrieb Emily Deng:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.
3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 


Good catch, patch is Reviewed-by: Christian König 

Really good debug, Emily, you can add my Reviewed-by: Chunming Zhou 
 as well if you still don't push it yet.


Regards,
David Zhou



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

   * is validated on next vm use to avoid fault.
   * */
  list_move_tail(>vm_status, >evicted);
+    base->moved = true;
  }
    /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-28 Thread Christian König

Am 28.08.2018 um 14:40 schrieb Emily Deng:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.
3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 


Good catch, patch is Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
 * is validated on next vm use to avoid fault.
 * */
list_move_tail(>vm_status, >evicted);
+   base->moved = true;
  }
  
  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-28 Thread Emily Deng
Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.
3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
 * is validated on next vm use to avoid fault.
 * */
list_move_tail(>vm_status, >evicted);
+   base->moved = true;
 }
 
 /**
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx