Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-05-06 Thread Ding, Pixel
3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
submitted at all.

I think that point is your misunderstanding. Killing process won’t stop jobB to 
be submitted.

— 
Sincerely Yours,
Pixel


On 2018/4/26, 12:23 PM, "amd-gfx on behalf of zhoucm1" 
 wrote:

NAK on it.

First of all, without this patch, Does it cause any issue?

second,

entity->last_scheduled present the last submiting job.

Your this change will break this meaning and don't work, e.g.

1. mirror list has jobA and jobB, assuming they are belonged to same 
entity, then the entity->last_scheduled is jobB->finished.

2. when you do recovery, re-submit jobA first, if don't update 
last_scheduled, then entity->last_scheduled still is jobB->finished.

3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
submitted at all.


So the change isn't necessary at all.


Regards,

David Zhou


On 2018年04月26日 11:47, Ding, Pixel wrote:
> Hi Monk,
>
> Please review it. Thanks.
>
> —
> Sincerely Yours,
> Pixel
>
>
> On 2018/4/25, 4:39 PM, "Pixel Ding"  wrote:
>
>  The current sequence in scheduler thread is:
>  1. update last sched fence
>  2. job begin (adding to mirror list)
>  3. job finish (remove from mirror list)
>  4. back to 1
>  
>  Since we update last sched prior to joining mirror list, the jobs
>  in mirror list already pass the last sched fence. TDR just run
>  the jobs in mirror list, so we should not update the last sched
>  fences in TDR.
>  
>  Signed-off-by: Pixel Ding 
>  ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>   1 file changed, 3 deletions(-)
>  
>  diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>  index 088ff2b..1f1dd70 100644
>  --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>  +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>  @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct 
drm_gpu_scheduler *sched)
>   fence = sched->ops->run_job(s_job);
>   atomic_inc(>hw_rq_count);
>   
>  -dma_fence_put(s_job->entity->last_scheduled);
>  -s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);
>  -
>   if (fence) {
>   s_fence->parent = dma_fence_get(fence);
>   r = dma_fence_add_callback(fence, _fence->cb,
>  --
>  2.7.4
>  
>  
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-05-03 Thread Christian König
That change is indeed fixing a problem. When drm_sched_job_recovery() is 
called s_job->entity should already be NULL.


And even when the process is killed we should still either re-submit the 
jobs or set the error.


BTW: There is another bug in that function: guilty_context needs to be 
kept alive over the course of a loop.


Going to submit a patch for that.

Regards,
Christian.

Am 26.04.2018 um 06:23 schrieb zhoucm1:

NAK on it.

First of all, without this patch, Does it cause any issue?

second,

entity->last_scheduled present the last submiting job.

Your this change will break this meaning and don't work, e.g.

1. mirror list has jobA and jobB, assuming they are belonged to same 
entity, then the entity->last_scheduled is jobB->finished.


2. when you do recovery, re-submit jobA first, if don't update 
last_scheduled, then entity->last_scheduled still is jobB->finished.


3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB 
isn't submitted at all.



So the change isn't necessary at all.


Regards,

David Zhou


On 2018年04月26日 11:47, Ding, Pixel wrote:

Hi Monk,

Please review it. Thanks.

—
Sincerely Yours,
Pixel


On 2018/4/25, 4:39 PM, "Pixel Ding"  wrote:

 The current sequence in scheduler thread is:
 1. update last sched fence
 2. job begin (adding to mirror list)
 3. job finish (remove from mirror list)
 4. back to 1
  Since we update last sched prior to joining mirror list, 
the jobs

 in mirror list already pass the last sched fence. TDR just run
 the jobs in mirror list, so we should not update the last sched
 fences in TDR.
  Signed-off-by: Pixel Ding 
 ---
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
  1 file changed, 3 deletions(-)
  diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c

 index 088ff2b..1f1dd70 100644
 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
 +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
 @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct 
drm_gpu_scheduler *sched)

  fence = sched->ops->run_job(s_job);
  atomic_inc(>hw_rq_count);
   - dma_fence_put(s_job->entity->last_scheduled);
 -    s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);

 -
  if (fence) {
  s_fence->parent = dma_fence_get(fence);
  r = dma_fence_add_callback(fence, _fence->cb,
 --
 2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-04-25 Thread zhoucm1

NAK on it.

First of all, without this patch, Does it cause any issue?

second,

entity->last_scheduled present the last submiting job.

Your this change will break this meaning and don't work, e.g.

1. mirror list has jobA and jobB, assuming they are belonged to same 
entity, then the entity->last_scheduled is jobB->finished.


2. when you do recovery, re-submit jobA first, if don't update 
last_scheduled, then entity->last_scheduled still is jobB->finished.


3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
submitted at all.



So the change isn't necessary at all.


Regards,

David Zhou


On 2018年04月26日 11:47, Ding, Pixel wrote:

Hi Monk,

Please review it. Thanks.

—
Sincerely Yours,
Pixel


On 2018/4/25, 4:39 PM, "Pixel Ding"  wrote:

 The current sequence in scheduler thread is:
 1. update last sched fence
 2. job begin (adding to mirror list)
 3. job finish (remove from mirror list)
 4. back to 1
 
 Since we update last sched prior to joining mirror list, the jobs

 in mirror list already pass the last sched fence. TDR just run
 the jobs in mirror list, so we should not update the last sched
 fences in TDR.
 
 Signed-off-by: Pixel Ding 

 ---
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
  1 file changed, 3 deletions(-)
 
 diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c

 index 088ff2b..1f1dd70 100644
 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
 +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
 @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler 
*sched)
fence = sched->ops->run_job(s_job);
atomic_inc(>hw_rq_count);
  
 -		dma_fence_put(s_job->entity->last_scheduled);

 -  s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);
 -
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, _fence->cb,
 --
 2.7.4
 
 


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-04-25 Thread Liu, Monk
Reviewed-by : Monk Liu 

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Pixel 
Ding
Sent: 2018年4月25日 16:40
To: amd-gfx@lists.freedesktop.org
Cc: Ding, Pixel 
Subject: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1

Since we update last sched prior to joining mirror list, the jobs in mirror 
list already pass the last sched fence. TDR just run the jobs in mirror list, 
so we should not update the last sched fences in TDR.

Signed-off-by: Pixel Ding 
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
fence = sched->ops->run_job(s_job);
atomic_inc(>hw_rq_count);
 
-   dma_fence_put(s_job->entity->last_scheduled);
-   s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);
-
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, _fence->cb,
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-04-25 Thread Ding, Pixel
Hi Monk,

Please review it. Thanks.

— 
Sincerely Yours,
Pixel


On 2018/4/25, 4:39 PM, "Pixel Ding"  wrote:

The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1

Since we update last sched prior to joining mirror list, the jobs
in mirror list already pass the last sched fence. TDR just run
the jobs in mirror list, so we should not update the last sched
fences in TDR.

Signed-off-by: Pixel Ding 
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler 
*sched)
fence = sched->ops->run_job(s_job);
atomic_inc(>hw_rq_count);
 
-   dma_fence_put(s_job->entity->last_scheduled);
-   s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);
-
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, _fence->cb,
-- 
2.7.4



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx