Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Grodzovsky, Andrey
Reviewed-by: Andrey Grodzovsky 

Andrey

On 10/30/19 6:20 AM, Koenig, Christian wrote:
> Am 30.10.19 um 10:13 schrieb S, Shirish:
>> [Why]
>>
>> doing kthread_park()/unpark() from drm_sched_entity_fini
>> while GPU reset is in progress defeats all the purpose of
>> drm_sched_stop->kthread_park.
>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>> drm_sched_stop->kthread_park nothing prevents from another
>> (third) thread to keep submitting job to HW which will be
>> picked up by the unparked scheduler thread and try to submit
>> to HW but fail because the HW ring is deactivated.
>>
>> [How]
>> grab the reset lock before calling drm_sched_entity_fini()
>>
>> Signed-off-by: Shirish S 
>> Suggested-by: Christian König 
> Patch itself is Reviewed-by: Christian König 
>
> Does that also fix the problems you have been seeing?
>
> Thanks,
> Christian.
>
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>>1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 6614d8a..2cdaf3b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
>> *mgr)
>>  continue;
>>  }
>>
>> -for (i = 0; i < num_entities; i++)
>> +for (i = 0; i < num_entities; i++) {
>> +mutex_lock(>adev->lock_reset);
>>  drm_sched_entity_fini(>entities[0][i].entity);
>> +mutex_unlock(>adev->lock_reset);
>> +}
>>  }
>>}
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Grodzovsky, Andrey

On 10/30/19 6:22 AM, S, Shirish wrote:
> On 10/30/2019 3:50 PM, Koenig, Christian wrote:
>> Am 30.10.19 um 10:13 schrieb S, Shirish:
>>> [Why]
>>>
>>> doing kthread_park()/unpark() from drm_sched_entity_fini
>>> while GPU reset is in progress defeats all the purpose of
>>> drm_sched_stop->kthread_park.
>>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>>> drm_sched_stop->kthread_park nothing prevents from another
>>> (third) thread to keep submitting job to HW which will be
>>> picked up by the unparked scheduler thread and try to submit
>>> to HW but fail because the HW ring is deactivated.
>>>
>>> [How]
>>> grab the reset lock before calling drm_sched_entity_fini()
>>>
>>> Signed-off-by: Shirish S 
>>> Suggested-by: Christian König 
>> Patch itself is Reviewed-by: Christian König 
>>
>> Does that also fix the problems you have been seeing?
> Yes Christian.
>
> Regards,
>
> Shirish S


Missed that one, why don't we fix it within scheduler code - the race is 
within scheduler ?

Andrey


>
>> Thanks,
>> Christian.
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index 6614d8a..2cdaf3b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
>>> *mgr)
>>> continue;
>>> }
>>> 
>>> -   for (i = 0; i < num_entities; i++)
>>> +   for (i = 0; i < num_entities; i++) {
>>> +   mutex_lock(>adev->lock_reset);
>>> 
>>> drm_sched_entity_fini(>entities[0][i].entity);
>>> +   mutex_unlock(>adev->lock_reset);
>>> +   }
>>> }
>>> }
>>> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread S, Shirish

On 10/30/2019 3:50 PM, Koenig, Christian wrote:
> Am 30.10.19 um 10:13 schrieb S, Shirish:
>> [Why]
>>
>> doing kthread_park()/unpark() from drm_sched_entity_fini
>> while GPU reset is in progress defeats all the purpose of
>> drm_sched_stop->kthread_park.
>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>> drm_sched_stop->kthread_park nothing prevents from another
>> (third) thread to keep submitting job to HW which will be
>> picked up by the unparked scheduler thread and try to submit
>> to HW but fail because the HW ring is deactivated.
>>
>> [How]
>> grab the reset lock before calling drm_sched_entity_fini()
>>
>> Signed-off-by: Shirish S 
>> Suggested-by: Christian König 
> Patch itself is Reviewed-by: Christian König 
>
> Does that also fix the problems you have been seeing?

Yes Christian.

Regards,

Shirish S

>
> Thanks,
> Christian.
>
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>>1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 6614d8a..2cdaf3b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
>> *mgr)
>>  continue;
>>  }
>>
>> -for (i = 0; i < num_entities; i++)
>> +for (i = 0; i < num_entities; i++) {
>> +mutex_lock(>adev->lock_reset);
>>  drm_sched_entity_fini(>entities[0][i].entity);
>> +mutex_unlock(>adev->lock_reset);
>> +}
>>  }
>>}
>>

-- 
Regards,
Shirish S

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Koenig, Christian
Am 30.10.19 um 10:13 schrieb S, Shirish:
> [Why]
>
> doing kthread_park()/unpark() from drm_sched_entity_fini
> while GPU reset is in progress defeats all the purpose of
> drm_sched_stop->kthread_park.
> If drm_sched_entity_fini->kthread_unpark() happens AFTER
> drm_sched_stop->kthread_park nothing prevents from another
> (third) thread to keep submitting job to HW which will be
> picked up by the unparked scheduler thread and try to submit
> to HW but fail because the HW ring is deactivated.
>
> [How]
> grab the reset lock before calling drm_sched_entity_fini()
>
> Signed-off-by: Shirish S 
> Suggested-by: Christian König 

Patch itself is Reviewed-by: Christian König 

Does that also fix the problems you have been seeing?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 6614d8a..2cdaf3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
> *mgr)
>   continue;
>   }
>   
> - for (i = 0; i < num_entities; i++)
> + for (i = 0; i < num_entities; i++) {
> + mutex_lock(>adev->lock_reset);
>   drm_sched_entity_fini(>entities[0][i].entity);
> + mutex_unlock(>adev->lock_reset);
> + }
>   }
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx