Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Grodzovsky, Andrey
Reviewed-by: Andrey Grodzovsky 

Andrey

On 10/30/19 6:20 AM, Koenig, Christian wrote:
> Am 30.10.19 um 10:13 schrieb S, Shirish:
>> [Why]
>>
>> doing kthread_park()/unpark() from drm_sched_entity_fini
>> while GPU reset is in progress defeats all the purpose of
>> drm_sched_stop->kthread_park.
>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>> drm_sched_stop->kthread_park nothing prevents from another
>> (third) thread to keep submitting job to HW which will be
>> picked up by the unparked scheduler thread and try to submit
>> to HW but fail because the HW ring is deactivated.
>>
>> [How]
>> grab the reset lock before calling drm_sched_entity_fini()
>>
>> Signed-off-by: Shirish S 
>> Suggested-by: Christian König 
> Patch itself is Reviewed-by: Christian König 
>
> Does that also fix the problems you have been seeing?
>
> Thanks,
> Christian.
>
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>>1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 6614d8a..2cdaf3b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
>> *mgr)
>>  continue;
>>  }
>>
>> -for (i = 0; i < num_entities; i++)
>> +for (i = 0; i < num_entities; i++) {
>> +mutex_lock(>adev->lock_reset);
>>  drm_sched_entity_fini(>entities[0][i].entity);
>> +mutex_unlock(>adev->lock_reset);
>> +}
>>  }
>>}
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Grodzovsky, Andrey

On 10/30/19 6:22 AM, S, Shirish wrote:
> On 10/30/2019 3:50 PM, Koenig, Christian wrote:
>> Am 30.10.19 um 10:13 schrieb S, Shirish:
>>> [Why]
>>>
>>> doing kthread_park()/unpark() from drm_sched_entity_fini
>>> while GPU reset is in progress defeats all the purpose of
>>> drm_sched_stop->kthread_park.
>>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>>> drm_sched_stop->kthread_park nothing prevents from another
>>> (third) thread to keep submitting job to HW which will be
>>> picked up by the unparked scheduler thread and try to submit
>>> to HW but fail because the HW ring is deactivated.
>>>
>>> [How]
>>> grab the reset lock before calling drm_sched_entity_fini()
>>>
>>> Signed-off-by: Shirish S 
>>> Suggested-by: Christian König 
>> Patch itself is Reviewed-by: Christian König 
>>
>> Does that also fix the problems you have been seeing?
> Yes Christian.
>
> Regards,
>
> Shirish S


Missed that one, why don't we fix it within scheduler code - the race is 
within scheduler ?

Andrey


>
>> Thanks,
>> Christian.
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index 6614d8a..2cdaf3b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
>>> *mgr)
>>> continue;
>>> }
>>> 
>>> -   for (i = 0; i < num_entities; i++)
>>> +   for (i = 0; i < num_entities; i++) {
>>> +   mutex_lock(>adev->lock_reset);
>>> 
>>> drm_sched_entity_fini(>entities[0][i].entity);
>>> +   mutex_unlock(>adev->lock_reset);
>>> +   }
>>> }
>>> }
>>> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread S, Shirish

On 10/30/2019 3:50 PM, Koenig, Christian wrote:
> Am 30.10.19 um 10:13 schrieb S, Shirish:
>> [Why]
>>
>> doing kthread_park()/unpark() from drm_sched_entity_fini
>> while GPU reset is in progress defeats all the purpose of
>> drm_sched_stop->kthread_park.
>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>> drm_sched_stop->kthread_park nothing prevents from another
>> (third) thread to keep submitting job to HW which will be
>> picked up by the unparked scheduler thread and try to submit
>> to HW but fail because the HW ring is deactivated.
>>
>> [How]
>> grab the reset lock before calling drm_sched_entity_fini()
>>
>> Signed-off-by: Shirish S 
>> Suggested-by: Christian König 
> Patch itself is Reviewed-by: Christian König 
>
> Does that also fix the problems you have been seeing?

Yes Christian.

Regards,

Shirish S

>
> Thanks,
> Christian.
>
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>>1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 6614d8a..2cdaf3b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
>> *mgr)
>>  continue;
>>  }
>>
>> -for (i = 0; i < num_entities; i++)
>> +for (i = 0; i < num_entities; i++) {
>> +mutex_lock(>adev->lock_reset);
>>  drm_sched_entity_fini(>entities[0][i].entity);
>> +mutex_unlock(>adev->lock_reset);
>> +}
>>  }
>>}
>>

-- 
Regards,
Shirish S

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Koenig, Christian
Am 30.10.19 um 10:13 schrieb S, Shirish:
> [Why]
>
> doing kthread_park()/unpark() from drm_sched_entity_fini
> while GPU reset is in progress defeats all the purpose of
> drm_sched_stop->kthread_park.
> If drm_sched_entity_fini->kthread_unpark() happens AFTER
> drm_sched_stop->kthread_park nothing prevents from another
> (third) thread to keep submitting job to HW which will be
> picked up by the unparked scheduler thread and try to submit
> to HW but fail because the HW ring is deactivated.
>
> [How]
> grab the reset lock before calling drm_sched_entity_fini()
>
> Signed-off-by: Shirish S 
> Suggested-by: Christian König 

Patch itself is Reviewed-by: Christian König 

Does that also fix the problems you have been seeing?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 6614d8a..2cdaf3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
> *mgr)
>   continue;
>   }
>   
> - for (i = 0; i < num_entities; i++)
> + for (i = 0; i < num_entities; i++) {
> + mutex_lock(>adev->lock_reset);
>   drm_sched_entity_fini(>entities[0][i].entity);
> + mutex_unlock(>adev->lock_reset);
> + }
>   }
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread S, Shirish
[Why]

doing kthread_park()/unpark() from drm_sched_entity_fini
while GPU reset is in progress defeats all the purpose of
drm_sched_stop->kthread_park.
If drm_sched_entity_fini->kthread_unpark() happens AFTER
drm_sched_stop->kthread_park nothing prevents from another
(third) thread to keep submitting job to HW which will be
picked up by the unparked scheduler thread and try to submit
to HW but fail because the HW ring is deactivated.

[How]
grab the reset lock before calling drm_sched_entity_fini()

Signed-off-by: Shirish S 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 6614d8a..2cdaf3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
continue;
}
 
-   for (i = 0; i < num_entities; i++)
+   for (i = 0; i < num_entities; i++) {
+   mutex_lock(>adev->lock_reset);
drm_sched_entity_fini(>entities[0][i].entity);
+   mutex_unlock(>adev->lock_reset);
+   }
}
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx