Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
Reviewed-by: Andrey Grodzovsky Andrey On 10/30/19 6:20 AM, Koenig, Christian wrote: > Am 30.10.19 um 10:13 schrieb S, Shirish: >> [Why] >> >> doing kthread_park()/unpark() from drm_sched_entity_fini >> while GPU reset is in progress defeats all the purpose of >> drm_sched_stop->kthread_park. >> If drm_sched_entity_fini->kthread_unpark() happens AFTER >> drm_sched_stop->kthread_park nothing prevents from another >> (third) thread to keep submitting job to HW which will be >> picked up by the unparked scheduler thread and try to submit >> to HW but fail because the HW ring is deactivated. >> >> [How] >> grab the reset lock before calling drm_sched_entity_fini() >> >> Signed-off-by: Shirish S >> Suggested-by: Christian König > Patch itself is Reviewed-by: Christian König > > Does that also fix the problems you have been seeing? > > Thanks, > Christian. > >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - >>1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> index 6614d8a..2cdaf3b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr >> *mgr) >> continue; >> } >> >> -for (i = 0; i < num_entities; i++) >> +for (i = 0; i < num_entities; i++) { >> +mutex_lock(>adev->lock_reset); >> drm_sched_entity_fini(>entities[0][i].entity); >> +mutex_unlock(>adev->lock_reset); >> +} >> } >>} >> ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
On 10/30/19 6:22 AM, S, Shirish wrote: > On 10/30/2019 3:50 PM, Koenig, Christian wrote: >> Am 30.10.19 um 10:13 schrieb S, Shirish: >>> [Why] >>> >>> doing kthread_park()/unpark() from drm_sched_entity_fini >>> while GPU reset is in progress defeats all the purpose of >>> drm_sched_stop->kthread_park. >>> If drm_sched_entity_fini->kthread_unpark() happens AFTER >>> drm_sched_stop->kthread_park nothing prevents from another >>> (third) thread to keep submitting job to HW which will be >>> picked up by the unparked scheduler thread and try to submit >>> to HW but fail because the HW ring is deactivated. >>> >>> [How] >>> grab the reset lock before calling drm_sched_entity_fini() >>> >>> Signed-off-by: Shirish S >>> Suggested-by: Christian König >> Patch itself is Reviewed-by: Christian König >> >> Does that also fix the problems you have been seeing? > Yes Christian. > > Regards, > > Shirish S Missed that one, why don't we fix it within scheduler code - the race is within scheduler ? Andrey > >> Thanks, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> index 6614d8a..2cdaf3b 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr >>> *mgr) >>> continue; >>> } >>> >>> - for (i = 0; i < num_entities; i++) >>> + for (i = 0; i < num_entities; i++) { >>> + mutex_lock(>adev->lock_reset); >>> >>> drm_sched_entity_fini(>entities[0][i].entity); >>> + mutex_unlock(>adev->lock_reset); >>> + } >>> } >>> } >>> ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
On 10/30/2019 3:50 PM, Koenig, Christian wrote: > Am 30.10.19 um 10:13 schrieb S, Shirish: >> [Why] >> >> doing kthread_park()/unpark() from drm_sched_entity_fini >> while GPU reset is in progress defeats all the purpose of >> drm_sched_stop->kthread_park. >> If drm_sched_entity_fini->kthread_unpark() happens AFTER >> drm_sched_stop->kthread_park nothing prevents from another >> (third) thread to keep submitting job to HW which will be >> picked up by the unparked scheduler thread and try to submit >> to HW but fail because the HW ring is deactivated. >> >> [How] >> grab the reset lock before calling drm_sched_entity_fini() >> >> Signed-off-by: Shirish S >> Suggested-by: Christian König > Patch itself is Reviewed-by: Christian König > > Does that also fix the problems you have been seeing? Yes Christian. Regards, Shirish S > > Thanks, > Christian. > >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - >>1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> index 6614d8a..2cdaf3b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr >> *mgr) >> continue; >> } >> >> -for (i = 0; i < num_entities; i++) >> +for (i = 0; i < num_entities; i++) { >> +mutex_lock(>adev->lock_reset); >> drm_sched_entity_fini(>entities[0][i].entity); >> +mutex_unlock(>adev->lock_reset); >> +} >> } >>} >> -- Regards, Shirish S ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset
Am 30.10.19 um 10:13 schrieb S, Shirish: > [Why] > > doing kthread_park()/unpark() from drm_sched_entity_fini > while GPU reset is in progress defeats all the purpose of > drm_sched_stop->kthread_park. > If drm_sched_entity_fini->kthread_unpark() happens AFTER > drm_sched_stop->kthread_park nothing prevents from another > (third) thread to keep submitting job to HW which will be > picked up by the unparked scheduler thread and try to submit > to HW but fail because the HW ring is deactivated. > > [How] > grab the reset lock before calling drm_sched_entity_fini() > > Signed-off-by: Shirish S > Suggested-by: Christian König Patch itself is Reviewed-by: Christian König Does that also fix the problems you have been seeing? Thanks, Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > index 6614d8a..2cdaf3b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr > *mgr) > continue; > } > > - for (i = 0; i < num_entities; i++) > + for (i = 0; i < num_entities; i++) { > + mutex_lock(>adev->lock_reset); > drm_sched_entity_fini(>entities[0][i].entity); > + mutex_unlock(>adev->lock_reset); > + } > } > } > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: dont schedule jobs while in reset
[Why] doing kthread_park()/unpark() from drm_sched_entity_fini while GPU reset is in progress defeats all the purpose of drm_sched_stop->kthread_park. If drm_sched_entity_fini->kthread_unpark() happens AFTER drm_sched_stop->kthread_park nothing prevents from another (third) thread to keep submitting job to HW which will be picked up by the unparked scheduler thread and try to submit to HW but fail because the HW ring is deactivated. [How] grab the reset lock before calling drm_sched_entity_fini() Signed-off-by: Shirish S Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 6614d8a..2cdaf3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr) continue; } - for (i = 0; i < num_entities; i++) + for (i = 0; i < num_entities; i++) { + mutex_lock(>adev->lock_reset); drm_sched_entity_fini(>entities[0][i].entity); + mutex_unlock(>adev->lock_reset); + } } } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx