Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
3. killed this process, will call to drm_sched_entity_cleanup, will register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't submitted at all. I think that point is your misunderstanding. Killing process won’t stop jobB to be submitted. — Sincerely Yours, Pixel On 2018/4/26, 12:23 PM, "amd-gfx on behalf of zhoucm1"wrote: NAK on it. First of all, without this patch, Does it cause any issue? second, entity->last_scheduled present the last submiting job. Your this change will break this meaning and don't work, e.g. 1. mirror list has jobA and jobB, assuming they are belonged to same entity, then the entity->last_scheduled is jobB->finished. 2. when you do recovery, re-submit jobA first, if don't update last_scheduled, then entity->last_scheduled still is jobB->finished. 3. killed this process, will call to drm_sched_entity_cleanup, will register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't submitted at all. So the change isn't necessary at all. Regards, David Zhou On 2018年04月26日 11:47, Ding, Pixel wrote: > Hi Monk, > > Please review it. Thanks. > > — > Sincerely Yours, > Pixel > > > On 2018/4/25, 4:39 PM, "Pixel Ding" wrote: > > The current sequence in scheduler thread is: > 1. update last sched fence > 2. job begin (adding to mirror list) > 3. job finish (remove from mirror list) > 4. back to 1 > > Since we update last sched prior to joining mirror list, the jobs > in mirror list already pass the last sched fence. TDR just run > the jobs in mirror list, so we should not update the last sched > fences in TDR. > > Signed-off-by: Pixel Ding > --- > drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c > index 088ff2b..1f1dd70 100644 > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c > @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched) > fence = sched->ops->run_job(s_job); > atomic_inc(>hw_rq_count); > > -dma_fence_put(s_job->entity->last_scheduled); > -s_job->entity->last_scheduled = dma_fence_get(_fence->finished); > - > if (fence) { > s_fence->parent = dma_fence_get(fence); > r = dma_fence_add_callback(fence, _fence->cb, > -- > 2.7.4 > > > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
That change is indeed fixing a problem. When drm_sched_job_recovery() is called s_job->entity should already be NULL. And even when the process is killed we should still either re-submit the jobs or set the error. BTW: There is another bug in that function: guilty_context needs to be kept alive over the course of a loop. Going to submit a patch for that. Regards, Christian. Am 26.04.2018 um 06:23 schrieb zhoucm1: NAK on it. First of all, without this patch, Does it cause any issue? second, entity->last_scheduled present the last submiting job. Your this change will break this meaning and don't work, e.g. 1. mirror list has jobA and jobB, assuming they are belonged to same entity, then the entity->last_scheduled is jobB->finished. 2. when you do recovery, re-submit jobA first, if don't update last_scheduled, then entity->last_scheduled still is jobB->finished. 3. killed this process, will call to drm_sched_entity_cleanup, will register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't submitted at all. So the change isn't necessary at all. Regards, David Zhou On 2018年04月26日 11:47, Ding, Pixel wrote: Hi Monk, Please review it. Thanks. — Sincerely Yours, Pixel On 2018/4/25, 4:39 PM, "Pixel Ding"wrote: The current sequence in scheduler thread is: 1. update last sched fence 2. job begin (adding to mirror list) 3. job finish (remove from mirror list) 4. back to 1 Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR. Signed-off-by: Pixel Ding --- drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c index 088ff2b..1f1dd70 100644 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched) fence = sched->ops->run_job(s_job); atomic_inc(>hw_rq_count); - dma_fence_put(s_job->entity->last_scheduled); - s_job->entity->last_scheduled = dma_fence_get(_fence->finished); - if (fence) { s_fence->parent = dma_fence_get(fence); r = dma_fence_add_callback(fence, _fence->cb, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
NAK on it. First of all, without this patch, Does it cause any issue? second, entity->last_scheduled present the last submiting job. Your this change will break this meaning and don't work, e.g. 1. mirror list has jobA and jobB, assuming they are belonged to same entity, then the entity->last_scheduled is jobB->finished. 2. when you do recovery, re-submit jobA first, if don't update last_scheduled, then entity->last_scheduled still is jobB->finished. 3. killed this process, will call to drm_sched_entity_cleanup, will register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't submitted at all. So the change isn't necessary at all. Regards, David Zhou On 2018年04月26日 11:47, Ding, Pixel wrote: Hi Monk, Please review it. Thanks. — Sincerely Yours, Pixel On 2018/4/25, 4:39 PM, "Pixel Ding"wrote: The current sequence in scheduler thread is: 1. update last sched fence 2. job begin (adding to mirror list) 3. job finish (remove from mirror list) 4. back to 1 Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR. Signed-off-by: Pixel Ding --- drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c index 088ff2b..1f1dd70 100644 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched) fence = sched->ops->run_job(s_job); atomic_inc(>hw_rq_count); - dma_fence_put(s_job->entity->last_scheduled); - s_job->entity->last_scheduled = dma_fence_get(_fence->finished); - if (fence) { s_fence->parent = dma_fence_get(fence); r = dma_fence_add_callback(fence, _fence->cb, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
Reviewed-by : Monk Liu-Original Message- From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Pixel Ding Sent: 2018年4月25日 16:40 To: amd-gfx@lists.freedesktop.org Cc: Ding, Pixel Subject: [PATCH] drm/scheduler: don't update last scheduled fence in TDR The current sequence in scheduler thread is: 1. update last sched fence 2. job begin (adding to mirror list) 3. job finish (remove from mirror list) 4. back to 1 Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR. Signed-off-by: Pixel Ding --- drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c index 088ff2b..1f1dd70 100644 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched) fence = sched->ops->run_job(s_job); atomic_inc(>hw_rq_count); - dma_fence_put(s_job->entity->last_scheduled); - s_job->entity->last_scheduled = dma_fence_get(_fence->finished); - if (fence) { s_fence->parent = dma_fence_get(fence); r = dma_fence_add_callback(fence, _fence->cb, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
Hi Monk, Please review it. Thanks. — Sincerely Yours, Pixel On 2018/4/25, 4:39 PM, "Pixel Ding"wrote: The current sequence in scheduler thread is: 1. update last sched fence 2. job begin (adding to mirror list) 3. job finish (remove from mirror list) 4. back to 1 Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR. Signed-off-by: Pixel Ding --- drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c index 088ff2b..1f1dd70 100644 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched) fence = sched->ops->run_job(s_job); atomic_inc(>hw_rq_count); - dma_fence_put(s_job->entity->last_scheduled); - s_job->entity->last_scheduled = dma_fence_get(_fence->finished); - if (fence) { s_fence->parent = dma_fence_get(fence); r = dma_fence_add_callback(fence, _fence->cb, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx