Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
Reviewed-by: Andrey Grodzovsky Andrey On 10/30/19 6:20 AM, Koenig, Christian wrote: > Am 30.10.19 um 10:13 schrieb S, Shirish: >> [Why] >> >> doing kthread_park()/unpark() from drm_sched_entity_fini >> while GPU reset is in progress defeats all the purpose of >> drm_sched_stop->kthread_park. >> If drm_sched_entity_fini->kthread_unpark() happens AFTER >> drm_sched_stop->kthread_park nothing prevents from another >> (third) thread to keep submitting job to HW which will be >> picked up by the unparked scheduler thread and try to submit >> to HW but fail because the HW ring is deactivated. >> >> [How] >> grab the reset lock before calling drm_sched_entity_fini() >> >> Signed-off-by: Shirish S >> Suggested-by: Christian König > Patch itself is Reviewed-by: Christian König > > Does that also fix the problems you have been seeing? > > Thanks, > Christian. > >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - >>1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> index 6614d8a..2cdaf3b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr >> *mgr) >> continue; >> } >> >> -for (i = 0; i < num_entities; i++) >> +for (i = 0; i < num_entities; i++) { >> +mutex_lock(&ctx->adev->lock_reset); >> drm_sched_entity_fini(&ctx->entities[0][i].entity); >> +mutex_unlock(&ctx->adev->lock_reset); >> +} >> } >>} >> ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
On 10/30/19 6:22 AM, S, Shirish wrote: > On 10/30/2019 3:50 PM, Koenig, Christian wrote: >> Am 30.10.19 um 10:13 schrieb S, Shirish: >>> [Why] >>> >>> doing kthread_park()/unpark() from drm_sched_entity_fini >>> while GPU reset is in progress defeats all the purpose of >>> drm_sched_stop->kthread_park. >>> If drm_sched_entity_fini->kthread_unpark() happens AFTER >>> drm_sched_stop->kthread_park nothing prevents from another >>> (third) thread to keep submitting job to HW which will be >>> picked up by the unparked scheduler thread and try to submit >>> to HW but fail because the HW ring is deactivated. >>> >>> [How] >>> grab the reset lock before calling drm_sched_entity_fini() >>> >>> Signed-off-by: Shirish S >>> Suggested-by: Christian König >> Patch itself is Reviewed-by: Christian König >> >> Does that also fix the problems you have been seeing? > Yes Christian. > > Regards, > > Shirish S Missed that one, why don't we fix it within scheduler code - the race is within scheduler ? Andrey > >> Thanks, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> index 6614d8a..2cdaf3b 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr >>> *mgr) >>> continue; >>> } >>> >>> - for (i = 0; i < num_entities; i++) >>> + for (i = 0; i < num_entities; i++) { >>> + mutex_lock(&ctx->adev->lock_reset); >>> >>> drm_sched_entity_fini(&ctx->entities[0][i].entity); >>> + mutex_unlock(&ctx->adev->lock_reset); >>> + } >>> } >>> } >>> ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
On 10/30/2019 3:50 PM, Koenig, Christian wrote: > Am 30.10.19 um 10:13 schrieb S, Shirish: >> [Why] >> >> doing kthread_park()/unpark() from drm_sched_entity_fini >> while GPU reset is in progress defeats all the purpose of >> drm_sched_stop->kthread_park. >> If drm_sched_entity_fini->kthread_unpark() happens AFTER >> drm_sched_stop->kthread_park nothing prevents from another >> (third) thread to keep submitting job to HW which will be >> picked up by the unparked scheduler thread and try to submit >> to HW but fail because the HW ring is deactivated. >> >> [How] >> grab the reset lock before calling drm_sched_entity_fini() >> >> Signed-off-by: Shirish S >> Suggested-by: Christian König > Patch itself is Reviewed-by: Christian König > > Does that also fix the problems you have been seeing? Yes Christian. Regards, Shirish S > > Thanks, > Christian. > >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - >>1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> index 6614d8a..2cdaf3b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr >> *mgr) >> continue; >> } >> >> -for (i = 0; i < num_entities; i++) >> +for (i = 0; i < num_entities; i++) { >> +mutex_lock(&ctx->adev->lock_reset); >> drm_sched_entity_fini(&ctx->entities[0][i].entity); >> +mutex_unlock(&ctx->adev->lock_reset); >> +} >> } >>} >> -- Regards, Shirish S ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
Am 30.10.19 um 10:13 schrieb S, Shirish: > [Why] > > doing kthread_park()/unpark() from drm_sched_entity_fini > while GPU reset is in progress defeats all the purpose of > drm_sched_stop->kthread_park. > If drm_sched_entity_fini->kthread_unpark() happens AFTER > drm_sched_stop->kthread_park nothing prevents from another > (third) thread to keep submitting job to HW which will be > picked up by the unparked scheduler thread and try to submit > to HW but fail because the HW ring is deactivated. > > [How] > grab the reset lock before calling drm_sched_entity_fini() > > Signed-off-by: Shirish S > Suggested-by: Christian König Patch itself is Reviewed-by: Christian König Does that also fix the problems you have been seeing? Thanks, Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > index 6614d8a..2cdaf3b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr > *mgr) > continue; > } > > - for (i = 0; i < num_entities; i++) > + for (i = 0; i < num_entities; i++) { > + mutex_lock(&ctx->adev->lock_reset); > drm_sched_entity_fini(&ctx->entities[0][i].entity); > + mutex_unlock(&ctx->adev->lock_reset); > + } > } > } > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx