Re: [PATCH v8] drm/sched: Add FIFO sched policy to run queue

2022-09-30 Thread Andrey Grodzovsky

Thanks for helping with review and good improvement ideas.

Pushed to drm-misc-next.

Andrey

On 2022-09-30 00:12, Luben Tuikov wrote:

From: Andrey Grodzovsky 

When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)

v6: Add missing drm_sched_rq_remove_fifo_locked

v7: Fix ts sampling bug and more cosmetic stuff (Luben)

v8: Fix module parameter string (Luben)

Cc: Luben Tuikov 
Cc: Christian König 
Cc: Direct Rendering Infrastructure - Development 

Cc: AMD Graphics 
Signed-off-by: Andrey Grodzovsky 
Tested-by: Yunxiang Li (Teddy) 
Signed-off-by: Luben Tuikov 
Reviewed-by: Luben Tuikov 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 20 +
  drivers/gpu/drm/scheduler/sched_main.c   | 96 +++-
  include/drm/gpu_scheduler.h  | 32 
  3 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a308..7060e4ed5a3148 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
  
  	if(num_sched_list)

entity->rq = _list[0]->sched_rq[entity->priority];
@@ -443,6 +444,19 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
  
  	spsc_queue_pop(>job_queue);

+
+   /*
+* Update the entity's location in the min heap according to
+* the timestamp of the next job, if any.
+*/
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
+   struct drm_sched_job *next;
+
+   next = to_drm_sched_job(spsc_queue_peek(>job_queue));
+   if (next)
+   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   }
+
return sched_job;
  }
  
@@ -507,6 +521,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)

atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
  
  	/* first job wakes up scheduler */

if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
+
drm_sched_wakeup(entity->rq->sched);
}
  }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 4f2395d1a79182..ce86b03e838699 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,6 +62,55 @@
  #define to_drm_sched_job(sched_job)   \
container_of((sched_job), struct drm_sched_job, queue_node)
  
+int drm_sched_policy = DRM_SCHED_POLICY_RR;

+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entities scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy, "Specify schedule policy for entities on a runqueue, " 
__stringify(DRM_SC

[PATCH v5] drm/sched: Add FIFO sched policy to run queue

2022-09-28 Thread Andrey Grodzovsky
When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
   
Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
   
v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
   
Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop
   
Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 26 ++-
 drivers/gpu/drm/scheduler/sched_main.c   | 99 +++-
 include/drm/gpu_scheduler.h  | 32 
 3 files changed, 151 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
 
if(num_sched_list)
entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
if (!sched_job)
-   return NULL;
+   goto skip;
 
while ((entity->dependency =
drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
-   if (drm_sched_entity_add_dependency_cb(entity))
-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
 
/* skip jobs from entity that marked guilty */
@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(>job_queue);
+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
 }
 
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, ts);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 4f2395d1a791..5349fc049384 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,6 +62,58 @@
 

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-27 Thread Andrey Grodzovsky
Hey, i have problems with my git-send today so i just attached V5 as a 
patch here.


Andrey

On 2022-09-27 19:56, Luben Tuikov wrote:

Inlined:

On 2022-09-22 12:15, Andrey Grodzovsky wrote:

On 2022-09-22 11:03, Luben Tuikov wrote:

The title of this patch has "v3", but "v4" in the title prefix.
If you're using "-v" to git-format-patch, please remove the "v3" from the title.

Inlined:

On 2022-09-21 14:28, Andrey Grodzovsky wrote:

When many entities competing for same run queue on
the same scheduler When many entities have  unacceptably long wait
time for some jobs waiting stuck in the run queue before being picked
up are observed (seen using  GPUVis).

Use this as your opening:

"When many entities are competing for the same run queue on the same scheduler,
we observe an unusually long wait times and some jobs get starved. This has
been observed on GPUVis."


The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stack for very long time in it's entity's

"stuck", not "stack".


queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
 
Fix:

Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
 
v2:

Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
 
Drop default option in module control parameter.


v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
 
Signed-off-by: Andrey Grodzovsky 

Tested-by: Li Yunxiang (Teddy) 
---
   drivers/gpu/drm/scheduler/sched_entity.c |  26 +-
   drivers/gpu/drm/scheduler/sched_main.c   | 107 ++-
   include/drm/gpu_scheduler.h  |  32 +++
   3 files changed, 159 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
   
   	if(num_sched_list)

entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
   
   	sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));

if (!sched_job)
-   return NULL;
+   goto skip;
   
   	while ((entity->dependency =

drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
   
-		if (drm_sched_entity_add_dependency_cb(entity))

-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
   
   	/* skip jobs from entity that marked guilty */

@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
   
   	spsc_queue_pop(>job_queue);

+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
   }
   
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)

   {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
   
   	trace_drm_sched_job(sched_job, entity);

atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
   
   	/* first job wakes up scheduler */

if (first) {
@@ -518,8 +533,13 @@

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-23 Thread Andrey Grodzovsky

Ping

Andrey

On 2022-09-22 12:15, Andrey Grodzovsky wrote:


On 2022-09-22 11:03, Luben Tuikov wrote:

The title of this patch has "v3", but "v4" in the title prefix.
If you're using "-v" to git-format-patch, please remove the "v3" from 
the title.


Inlined:

On 2022-09-21 14:28, Andrey Grodzovsky wrote:

When many entities competing for same run queue on
the same scheduler When many entities have  unacceptably long wait
time for some jobs waiting stuck in the run queue before being picked
up are observed (seen using  GPUVis).

Use this as your opening:

"When many entities are competing for the same run queue on the same 
scheduler,
we observe an unusually long wait times and some jobs get starved. 
This has

been observed on GPUVis."


The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stack for very long time in it's entity's

"stuck", not "stack".


queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
    Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
    v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
    Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update 
function. (Luben)


v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
    Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  26 +-
  drivers/gpu/drm/scheduler/sched_main.c   | 107 
++-

  include/drm/gpu_scheduler.h  |  32 +++
  3 files changed, 159 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity 
*entity,

  entity->priority = priority;
  entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
  entity->last_scheduled = NULL;
+    RB_CLEAR_NODE(>rb_tree_node);
    if(num_sched_list)
  entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job 
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)
    sched_job = 
to_drm_sched_job(spsc_queue_peek(>job_queue));

  if (!sched_job)
-    return NULL;
+    goto skip;
    while ((entity->dependency =
  drm_sched_job_dependency(sched_job, entity))) {
  trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
  -    if (drm_sched_entity_add_dependency_cb(entity))
-    return NULL;
+    if (drm_sched_entity_add_dependency_cb(entity)) {
+    sched_job = NULL;
+    goto skip;
+    }
  }
    /* skip jobs from entity that marked guilty */
@@ -443,6 +446,16 @@ struct drm_sched_job 
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)

  smp_wmb();
    spsc_queue_pop(>job_queue);
+
+    /*
+ * It's when head job is extracted we can access the next job 
(or empty)
+ * queue and update the entity location in the min heap 
accordingly.

+ */
+skip:
+    if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+    drm_sched_rq_update_fifo(entity,
+ (sched_job ? sched_job->submit_ts : 
ktime_get()));

+
  return sched_job;
  }
  @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  {
  struct drm_sched_entity *entity = sched_job->entity;
  bool first;
+    ktime_t ts =  ktime_get();
    trace_drm_sched_job(sched_job, entity);
  atomic_inc(entity->rq->sched->score);
  WRITE_ONCE(entity->last_user, current->group_leader);
  first = spsc_queue_push(>job_queue, 
_job->queue_node);

+    sched_job->submit_ts = ts;
    /* first job wakes up scheduler */
  if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  DRM_ERROR("Trying to push to a killed entity\n");
  return;
  }
+
  drm_sched_rq_add_entity(entity->rq, entity);
  spin_unlock(>rq_lock);
+
+    if (drm_sch

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-22 Thread Andrey Grodzovsky



On 2022-09-22 11:03, Luben Tuikov wrote:

The title of this patch has "v3", but "v4" in the title prefix.
If you're using "-v" to git-format-patch, please remove the "v3" from the title.

Inlined:

On 2022-09-21 14:28, Andrey Grodzovsky wrote:

When many entities competing for same run queue on
the same scheduler When many entities have  unacceptably long wait
time for some jobs waiting stuck in the run queue before being picked
up are observed (seen using  GPUVis).

Use this as your opening:

"When many entities are competing for the same run queue on the same scheduler,
we observe an unusually long wait times and some jobs get starved. This has
been observed on GPUVis."


The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stack for very long time in it's entity's

"stuck", not "stack".


queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

Fix:

Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:

Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.


v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

Signed-off-by: Andrey Grodzovsky 

Tested-by: Li Yunxiang (Teddy) 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  26 +-
  drivers/gpu/drm/scheduler/sched_main.c   | 107 ++-
  include/drm/gpu_scheduler.h  |  32 +++
  3 files changed, 159 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
  
  	if(num_sched_list)

entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
  
  	sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));

if (!sched_job)
-   return NULL;
+   goto skip;
  
  	while ((entity->dependency =

drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
  
-		if (drm_sched_entity_add_dependency_cb(entity))

-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
  
  	/* skip jobs from entity that marked guilty */

@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
  
  	spsc_queue_pop(>job_queue);

+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
  }
  
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)

  {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
  
  	trace_drm_sched_job(sched_job, entity);

atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
  
  	/* first job wakes up scheduler */

if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
d

[PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-21 Thread Andrey Grodzovsky
When many entities competing for same run queue on
the same scheduler When many entities have  unacceptably long wait
time for some jobs waiting stuck in the run queue before being picked
up are observed (seen using  GPUVis).
The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
   
Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
   
v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
   
Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
   
Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  26 +-
 drivers/gpu/drm/scheduler/sched_main.c   | 107 ++-
 include/drm/gpu_scheduler.h  |  32 +++
 3 files changed, 159 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
 
if(num_sched_list)
entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
if (!sched_job)
-   return NULL;
+   goto skip;
 
while ((entity->dependency =
drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
-   if (drm_sched_entity_add_dependency_cb(entity))
-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
 
/* skip jobs from entity that marked guilty */
@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(>job_queue);
+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
 }
 
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, ts);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 4f2395d1a791..565707a1c5c7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,6 +62,64 @@
 

Re: [PATCH v3] drm/sched: Add FIFO sched policy to run queue v3

2022-09-20 Thread Andrey Grodzovsky



On 2022-09-19 23:11, Luben Tuikov wrote:

Please run this patch through checkpatch.pl, as it shows
12 warnings with it. Use these command line options:
"--strict --show-types".

Inlined:

On 2022-09-13 16:40, Andrey Grodzovsky wrote:

Given many entities competing for same run queue on
the same scheduler and unacceptably long wait time for some
jobs waiting stuck in the run queue before being picked up are
observed (seen using  GPUVis).

Since the second part of this sentence is the result of the first,
I'd say something like "When many entities ... we see unacceptably long ...".


The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queus within entity some

Spelling: "queues".


jobs could be stack for very long time in it's entity's

"stuck", not "stack".


queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

"than".


Fix:

Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:

Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.


v3:
Various cosmetical fixes and minor refactoring of fifo update function.
Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  26 -
  drivers/gpu/drm/scheduler/sched_main.c   | 132 ++-
  include/drm/gpu_scheduler.h  |  35 ++
  3 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
  
  	if(num_sched_list)

entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
  
  	sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));

if (!sched_job)
-   return NULL;
+   goto skip;
  
  	while ((entity->dependency =

drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
  
-		if (drm_sched_entity_add_dependency_cb(entity))

-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
  
  	/* skip jobs from entity that marked guilty */

@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
  
  	spsc_queue_pop(>job_queue);

+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
  }
  
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)

  {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
  
  	trace_drm_sched_job(sched_job, entity);

atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
  
  	/* first job wakes up scheduler */

if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-19 Thread Andrey Grodzovsky
I don't know if issue still exist but it worth checking with Christian 
who wrote this patch.


Andrey


On 2022-09-16 23:31, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Andrey,

Yes, moving irq disable can fix the issue. Change in amdgpu_fence_process is 
just want to make sure driver can correct itself from an overflow situation. 
Didn’t know about the previous issue there.

Do you know if the issue still exists? Or is it on VCE only?


Thanks,
Victor



-Original Message-
From: Grodzovsky, Andrey 
Sent: Friday, September 16, 2022 9:50 PM
To: Koenig, Christian ; Zhao, Victor 
; amd-gfx@lists.freedesktop.org
Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow


On 2022-09-16 01:18, Christian König wrote:

Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky:

On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:

On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running
a lot of containers submitting gfx jobs. We have advanced tdr mode
and mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx
pending list maybe signaled after drm_sched_stop. So they will not
be removed from pending list but have the
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will
be rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be
resubmitted. Since it still has signaled bit, drm_sched_job_done
will be called directly. This decrease the hw_rq_count which
allows more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use
num_fences_mask in amdgpu_fence_process, when overflow happens,
the signal of some job will be skipped which result in an infinite
wait for the fence_drv rcu ptr.

So close irq before sched_stop could avoid signal jobs after
drm_sched_stop. And signal job one by one in fence_process instead
of using a mask will handle the overflow situation.

Another fix could be skip submitting jobs which already signaled
during resubmit stage, which may look cleaner.

Please help give some advice.


How about the code bellow  instead ? The real problem is that we
reuse a dma fence twice which is not according to fma fence design,
so maybe this can help ?


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring
*ring, struct dma_fence **f, struct amd
     if (job && job->job_run_counter) {
     /* reinit seq for resubmitted jobs */
     fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>flags);
+

Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a
massive no-go.

Christian.


Is it worse then doing fence->seqno = seq; ? This is already a huge
hack , no ?

No, it's as equally bad. I don't think we can do either.

Christian.


And all those ugly hack are there because we reuse a dma_fence (hw_fence 
embedded into the job) and correct me if I am wrong but I don't think dma_fence 
is ever supposed to be reused.

So maybe like Victor suggested we should move close and flush irq before 
sched_stop - this in my opinion should solve the issue, but Victor - why then 
you still need the change in amdgpu_fence_process ? You will not have the 
overflow situation because by moving irq_disable before stop any job that 
signaled will be removed from the scheduler pending list anyway. Also not that 
this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce 
that bug.

Andrey



Andrey



     /* TO be inline with external fence creation and
other drivers */
     dma_fence_get(fence);
     } else {


Andrey




Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ;
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey

Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by
overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the
sequence. Please help give some comments.


Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-16 Thread Andrey Grodzovsky



On 2022-09-16 01:18, Christian König wrote:

Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky:


On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running 
a lot of containers submitting gfx jobs. We have advanced tdr mode 
and mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not 
be removed from pending list but have the 
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will 
be rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done 
will be called directly. This decrease the hw_rq_count which 
allows more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, 
the signal of some job will be skipped which result in an infinite 
wait for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead 
of using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we 
reuse a dma fence twice which is not according to fma fence design, 
so maybe this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring 
*ring, struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, 
>flags);

+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a 
massive no-go.


Christian.



Is it worse then doing fence->seqno = seq; ? This is already a huge 
hack , no ?


No, it's as equally bad. I don't think we can do either.

Christian.



And all those ugly hack are there because we reuse a dma_fence (hw_fence 
embedded into the job) and correct me if I am wrong

but I don't think dma_fence is ever supposed to be reused.

So maybe like Victor suggested we should move close and flush irq before 
sched_stop - this in my opinion should solve the issue, but Victor - why 
then you still need the change in amdgpu_fence_process ? You will not 
have the overflow situation because by moving irq_disable before stop 
any job that signaled will be removed from the scheduler pending list 
anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence 
numbers' and could reintroduce that bug.


Andrey






Andrey






    /* TO be inline with external fence creation and 
other drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), 
jobs from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence 
signaled bit set will call job done directly instead go through 
fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may 
be skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Andrey Grodzovsky



On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a 
lot of containers submitting gfx jobs. We have advanced tdr mode and 
mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not 
be removed from pending list but have the 
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be 
rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done 
will be called directly. This decrease the hw_rq_count which allows 
more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, the 
signal of some job will be skipped which result in an infinite wait 
for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead 
of using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we 
reuse a dma fence twice which is not according to fma fence design, 
so maybe this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags);
+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a massive 
no-go.


Christian.



Is it worse then doing fence->seqno = seq; ? This is already a huge hack 
, no ?


Andrey






    /* TO be inline with external fence creation and 
other drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), 
jobs from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence 
signaled bit set will call job done directly instead go through 
fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may 
be skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step because kthread_park 
cannot be done.


[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 
+---  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  6 
+-

   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

   amdgpu_virt_fini_data_exchange(adev);
   }
   -    amdgpu_fen

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Andrey Grodzovsky

Had a typo - see bellow

On 2022-09-15 14:29, Andrey Grodzovsky wrote:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a 
lot of containers submitting gfx jobs. We have advanced tdr mode and 
mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not be 
removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be 
rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done will 
be called directly. This decrease the hw_rq_count which allows more 
jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, the 
signal of some job will be skipped which result in an infinite wait 
for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead of 
using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we reuse 
a dma fence twice which is not according to fma fence design, so maybe 
this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags);
+
    /* TO be inline with external fence creation and other 
drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), jobs 
from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence signaled 
bit set will call job done directly instead go through fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may be 
skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step because kthread_park 
cannot be done.


[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  6 +-

   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

   amdgpu_virt_fini_data_exchange(adev);
   }
   -    amdgpu_fence_driver_isr_toggle(adev, true);
-
   /* block all schedulers and reset given job's ring */
   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
   struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 
+5212,8 @@ int amdgpu_device_gpu_recover(str

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Andrey Grodzovsky



On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a lot of 
containers submitting gfx jobs. We have advanced tdr mode and mode2 reset 
enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx pending list 
maybe signaled after drm_sched_stop. So they will not be removed from pending 
list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and 
removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be resubmitted. 
Since it still has signaled bit, drm_sched_job_done will be called directly. 
This decrease the hw_rq_count which allows more jobs emitted but did not clean 
fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use num_fences_mask 
in amdgpu_fence_process, when overflow happens, the signal of some job will be 
skipped which result in an infinite wait for the fence_drv rcu ptr.

So close irq before sched_stop could avoid signal jobs after drm_sched_stop. 
And signal job one by one in fence_process instead of using a mask will handle 
the overflow situation.

Another fix could be skip submitting jobs which already signaled during 
resubmit stage, which may look cleaner.

Please help give some advice.



How about the code bellow  instead ? The real problem is that we reuse a 
dma fence twice which is not according to fma fence design, so maybe 
this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags);
+
    /* TO be inline with external fence creation and other 
drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, 
Andrey 
Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the sequence. Please 
help give some comments.


Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from 
another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The 
signal bit will not be cleared.

At the resubmit stage after recovery, the job with hw fence signaled bit set 
will call job done directly instead go through fence process.
This makes the hw_rq_count decrease but rcu fence pointer not cleared yet.

Then overflow happens in the fence driver slots and some jobs may be skipped 
and leave the rcu pointer not cleared which makes an infinite wait for the slot 
on the next fence emitted.

This infinite wait cause a job timeout on the emitting job. And driver will 
stuck at the its sched stop step because kthread_park cannot be done.

[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  6 +-
   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_virt_fini_data_exchange(adev);
}
   
-	amdgpu_fence_driver_isr_toggle(adev, true);

-
/* block all schedulers and reset given job's ring */
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 +5212,8 
@@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  amdgpu_device_ip_need_full_reset(tmp_adev))

[PATCH v3] drm/sched: Add FIFO sched policy to run queue v3

2022-09-13 Thread Andrey Grodzovsky
Given many entities competing for same run queue on
the same scheduler and unacceptably long wait time for some
jobs waiting stuck in the run queue before being picked up are
observed (seen using  GPUVis).
The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
   
Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.
   
v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue
   
Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function.
Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  26 -
 drivers/gpu/drm/scheduler/sched_main.c   | 132 ++-
 include/drm/gpu_scheduler.h  |  35 ++
 3 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
 
if(num_sched_list)
entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
if (!sched_job)
-   return NULL;
+   goto skip;
 
while ((entity->dependency =
drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
-   if (drm_sched_entity_add_dependency_cb(entity))
-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
 
/* skip jobs from entity that marked guilty */
@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(>job_queue);
+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
 }
 
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, ts);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index e5a4ecde0063..72f7105e0b16 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,6 +62,65 @@
 #define to_drm_sched_job(sched_job)\
container_of((sched_job), struct drm_sched_job,

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-13 Thread Andrey Grodzovsky
I guess, but this is kind of implicit assumption which is not really 
documented and easily overlooked.

Anyway - for this code it's not directly relevant.

Andrey


On 2022-09-13 03:25, Christian König wrote:

Am 13.09.22 um 04:00 schrieb Andrey Grodzovsky:


[SNIP]

You are right for scheduler mediated submissions (executing through 
drm_sched_backend_ops.run_job hook) , I am talking about direct 
submissions without gpu scheduler (using amdgpu_job_submit_direct)


Andrey


Direct submission is only used while initially testing the hardware, 
during a GPU reset/recovery or for handling page faults with the SDMA.


In other words when we know that we have exclusive access to the 
hardware.


Christian.


Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-12 Thread Andrey Grodzovsky



On 2022-09-12 21:44, Zhu, Jiadong wrote:

[AMD Official Use Only - General]

-Original Message-
From: Grodzovsky, Andrey 
Sent: Tuesday, September 13, 2022 12:45 AM
To: Christian König ; Zhu, Jiadong 
; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray 
Subject: Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)


On 2022-09-12 12:22, Christian König wrote:

Am 12.09.22 um 17:34 schrieb Andrey Grodzovsky:

On 2022-09-12 09:27, Christian König wrote:


Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky:

On 2022-09-12 06:20, Christian König wrote:

Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky:

On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

The software ring is created to support priority context while
there is only one hardware queue for gfx.

Every software rings has its fence driver and could be used as
an ordinary ring for the gpu_scheduler.
Multiple software rings are binded to a real ring with the ring
muxer. The packages committed on the software ring are copied to
the real ring.

v2: use array to store software ring entry.
v3: remove unnecessary prints.

Signed-off-by: Jiadong.Zhu 
---
   drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182
+
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204
+++
   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
   7 files changed, 509 insertions(+), 1 deletion(-)
   create mode 100644
drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
   create mode 100644
drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 3e0e2eb7e235..85224bc81ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
   amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o
amdgpu_nbio.o \
   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o
amdgpu_rap.o \
   amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+\
+amdgpu_sw_ring.o amdgpu_ring_mux.o
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 53526ffb2ce1..0de8e3cd0f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
   #include "amdgpu_imu.h"
   #include "soc15.h"
   #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
 /* GFX current status */
   #define AMDGPU_GFX_NORMAL_MODE 0xL @@ -346,6 +347,8 @@
struct amdgpu_gfx {
   struct amdgpu_gfx_ras*ras;
 boolis_poweron;
+
+struct amdgpu_ring_muxmuxer;
   };
 #define amdgpu_gfx_get_gpu_clock_counter(adev)
(adev)->gfx.funcs->get_gpu_clock_counter((adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7d89a52091c0..fe33a683bfba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,9 @@ struct amdgpu_ring {
   boolis_mes_queue;
   uint32_thw_queue_id;
   struct amdgpu_mes_ctx_data *mes_ctx;
+
+boolis_sw_ring;
+
   };
 #define amdgpu_ring_parse_cs(r, p, job, ib)
((r)->funcs->parse_cs((p), (job), (ib))) diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
new file mode 100644
index ..ea4a3c66119a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person
obtaining a
+ * copy of this software and associated documentation files
(the "Software"),
+ * to deal in the Software without restriction, including
without limitation
+ * the rights to use, copy, modify, merge, publish, distribute,
sublicense,
+ * and/or sell copies of the Software, and to permit persons to
whom the
+ * Software is furnished to do so, subject to the following
conditions:
+ *
+ * The above copyright notice and this permission notice shall
be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PA

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-12 Thread Andrey Grodzovsky



On 2022-09-12 12:22, Christian König wrote:

Am 12.09.22 um 17:34 schrieb Andrey Grodzovsky:

On 2022-09-12 09:27, Christian König wrote:


Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky:


On 2022-09-12 06:20, Christian König wrote:

Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky:


On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

The software ring is created to support priority
context while there is only one hardware queue
for gfx.

Every software rings has its fence driver and could
be used as an ordinary ring for the gpu_scheduler.
Multiple software rings are binded to a real ring
with the ring muxer. The packages committed on the
software ring are copied to the real ring.

v2: use array to store software ring entry.
v3: remove unnecessary prints.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 
+

  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204 
+++

  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
  7 files changed, 509 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 3e0e2eb7e235..85224bc81ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o 
amdgpu_nbio.o \
  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o 
amdgpu_rap.o \

  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-    amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+    amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+    amdgpu_sw_ring.o amdgpu_ring_mux.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

index 53526ffb2ce1..0de8e3cd0f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
  #include "amdgpu_imu.h"
  #include "soc15.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
    /* GFX current status */
  #define AMDGPU_GFX_NORMAL_MODE 0xL
@@ -346,6 +347,8 @@ struct amdgpu_gfx {
  struct amdgpu_gfx_ras    *ras;
    bool    is_poweron;
+
+    struct amdgpu_ring_mux    muxer;
  };
    #define amdgpu_gfx_get_gpu_clock_counter(adev) 
(adev)->gfx.funcs->get_gpu_clock_counter((adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 7d89a52091c0..fe33a683bfba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,9 @@ struct amdgpu_ring {
  bool    is_mes_queue;
  uint32_t    hw_queue_id;
  struct amdgpu_mes_ctx_data *mes_ctx;
+
+    bool    is_sw_ring;
+
  };
    #define amdgpu_ring_parse_cs(r, p, job, ib) 
((r)->funcs->parse_cs((p), (job), (ib)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c

new file mode 100644
index ..ea4a3c66119a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files 
(the "Software"),
+ * to deal in the Software without restriction, including 
without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to 
whom the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall 
be included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY 
KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY 
CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR 
THE USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-12 Thread Andrey Grodzovsky

On 2022-09-12 09:27, Christian König wrote:


Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky:


On 2022-09-12 06:20, Christian König wrote:

Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky:


On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

The software ring is created to support priority
context while there is only one hardware queue
for gfx.

Every software rings has its fence driver and could
be used as an ordinary ring for the gpu_scheduler.
Multiple software rings are binded to a real ring
with the ring muxer. The packages committed on the
software ring are copied to the real ring.

v2: use array to store software ring entry.
v3: remove unnecessary prints.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 
+

  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204 
+++

  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
  7 files changed, 509 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 3e0e2eb7e235..85224bc81ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o 
amdgpu_nbio.o \

  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-    amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+    amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+    amdgpu_sw_ring.o amdgpu_ring_mux.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

index 53526ffb2ce1..0de8e3cd0f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
  #include "amdgpu_imu.h"
  #include "soc15.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
    /* GFX current status */
  #define AMDGPU_GFX_NORMAL_MODE    0xL
@@ -346,6 +347,8 @@ struct amdgpu_gfx {
  struct amdgpu_gfx_ras    *ras;
    bool    is_poweron;
+
+    struct amdgpu_ring_mux    muxer;
  };
    #define amdgpu_gfx_get_gpu_clock_counter(adev) 
(adev)->gfx.funcs->get_gpu_clock_counter((adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 7d89a52091c0..fe33a683bfba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,9 @@ struct amdgpu_ring {
  bool    is_mes_queue;
  uint32_t    hw_queue_id;
  struct amdgpu_mes_ctx_data *mes_ctx;
+
+    bool    is_sw_ring;
+
  };
    #define amdgpu_ring_parse_cs(r, p, job, ib) 
((r)->funcs->parse_cs((p), (job), (ib)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c

new file mode 100644
index ..ea4a3c66119a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to 
whom the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY 
KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "amdgpu_ring_mux.h"
+#include "amdgpu_ring.h&

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-12 Thread Andrey Grodzovsky



On 2022-09-12 06:20, Christian König wrote:

Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky:


On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

The software ring is created to support priority
context while there is only one hardware queue
for gfx.

Every software rings has its fence driver and could
be used as an ordinary ring for the gpu_scheduler.
Multiple software rings are binded to a real ring
with the ring muxer. The packages committed on the
software ring are copied to the real ring.

v2: use array to store software ring entry.
v3: remove unnecessary prints.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204 
+++

  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
  7 files changed, 509 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 3e0e2eb7e235..85224bc81ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
  amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o 
amdgpu_nbio.o \

  amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
  amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-    amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+    amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+    amdgpu_sw_ring.o amdgpu_ring_mux.o
    amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

index 53526ffb2ce1..0de8e3cd0f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
  #include "amdgpu_imu.h"
  #include "soc15.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
    /* GFX current status */
  #define AMDGPU_GFX_NORMAL_MODE    0xL
@@ -346,6 +347,8 @@ struct amdgpu_gfx {
  struct amdgpu_gfx_ras    *ras;
    bool    is_poweron;
+
+    struct amdgpu_ring_mux    muxer;
  };
    #define amdgpu_gfx_get_gpu_clock_counter(adev) 
(adev)->gfx.funcs->get_gpu_clock_counter((adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 7d89a52091c0..fe33a683bfba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,9 @@ struct amdgpu_ring {
  bool    is_mes_queue;
  uint32_t    hw_queue_id;
  struct amdgpu_mes_ctx_data *mes_ctx;
+
+    bool    is_sw_ring;
+
  };
    #define amdgpu_ring_parse_cs(r, p, job, ib) 
((r)->funcs->parse_cs((p), (job), (ib)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c

new file mode 100644
index ..ea4a3c66119a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to 
whom the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "amdgpu_ring_mux.h"
+#include "amdgpu_ring.h"
+
+#define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ/2)
+
+static int copy_pkt_from

Re: [PATCH 4/4] drm/amdgpu: Implement OS triggered MCBP(v2)

2022-09-09 Thread Andrey Grodzovsky



On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

Trigger MCBP according to the priroty of the
software rings and the hw fence signaling
condition.

The muxer records some lastest locations from the
software ring which is used to resubmit packages
in preemption scenarios.

v2: update comment style

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c | 101 
  drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h |  29 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 163 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  16 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 +++
  9 files changed, 351 insertions(+), 3 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 85224bc81ce5..24c5aa19bbf2 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-   amdgpu_sw_ring.o amdgpu_ring_mux.o
+   amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o
  
  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c

index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
}
  
+	amdgpu_ring_ib_begin(ring);

if (job && ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
  
@@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,

ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
ring->funcs->emit_wave_limit(ring, false);
  
+	amdgpu_ring_ib_end(ring);

amdgpu_ring_commit(ring);
return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
new file mode 100644
index ..2a12101a7699
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "amdgpu.h"
+#include "amdgpu_mcbp.h"
+#include "amdgpu_ring.h"
+
+/* trigger mcbp and find if we need resubmit */
+int amdgpu_mcbp_trigger_preempt(struct amdgpu_ring_mux *mux)
+{
+   struct amdgpu_mux_entry *e;
+   struct amdgpu_ring *ring = NULL;
+   int i;
+
+   DRM_INFO("%s in\n", __func__);
+
+   spin_lock(>lock);



Same comment/question about locking as in patch 1



+
+   amdgpu_ring_preempt_ib(mux->real_ring);
+
+   ring = NULL;
+   for (i = 0; i < mux->num_ring_entries; i++) {
+   e = >ring_entries[i];
+   if (e->ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT) {
+   ring = e->ring;
+   break;
+   }
+   }
+
+   if (!ring) {
+   DRM_ERROR("cannot find low priority ring\n");
+   return -ENOENT;
+   }
+
+   amdgpu_fence_process(ring);



What's the role of fence signaling here (sorry, I am not very 
knowledgeable about how exactly mcbp works) ?




+
+   DRM_INFO("after preempted 

Re: [PATCH 3/4] drm/amdgpu: Modify unmap_queue format for gfx9(v2)

2022-09-09 Thread Andrey Grodzovsky
Really can't say to much here as I am not really familiar with queues 
map/unmap...


Andrey

On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

1. Modify the unmap_queue package on gfx9.
Add trailing fence to track the preemption done.
2. Modify emit_ce_meta emit_de_meta functions
for the resumed ibs.

v2: restyle code not to use ternary operator.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 181 +++
  drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
  3 files changed, 155 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index ba6d8c753f7e..d3155dc86c07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -60,6 +60,7 @@ enum amdgpu_ring_priority_level {
  #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
  #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
  #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)
+#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)
  
  #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring, sched)
  
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

index 774e44e1074a..89a5c45b1006 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -753,7 +753,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
*adev);
  static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
struct amdgpu_cu_info *cu_info);
  static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
-static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring);
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume);
  static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
  static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
  void *ras_error_status);
@@ -826,9 +826,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
*kiq_ring,

PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
  
  	if (action == PREEMPT_QUEUES_NO_UNMAP) {

-   amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, seq);
+   amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
ring->buf_mask));
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+
} else {
amdgpu_ring_write(kiq_ring, 0);
amdgpu_ring_write(kiq_ring, 0);
@@ -5356,11 +5357,16 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
  
  	control |= ib->length_dw | (vmid << 24);
  
-	if (amdgpu_sriov_vf(ring->adev) && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {

+   if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags & 
AMDGPU_IB_FLAG_PREEMPT)) {
control |= INDIRECT_BUFFER_PRE_ENB(1);
  
+		if (flags & AMDGPU_IB_PREEMPTED)

+   control |= INDIRECT_BUFFER_PRE_RESUME(1);
+
if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
-   gfx_v9_0_ring_emit_de_meta(ring);
+   gfx_v9_0_ring_emit_de_meta(ring,
+(!amdgpu_sriov_vf(ring->adev) && flags & 
AMDGPU_IB_PREEMPTED) ?
+   true : false);
}
  
  	amdgpu_ring_write(ring, header);

@@ -5415,17 +5421,23 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
bool writeback = flags & AMDGPU_FENCE_FLAG_TC_WB_ONLY;
+   bool exec = flags & AMDGPU_FENCE_FLAG_EXEC;
+   uint32_t dw2 = 0;
  
  	/* RELEASE_MEM - flush caches, send int */

amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6));
-   amdgpu_ring_write(ring, ((writeback ? (EOP_TC_WB_ACTION_EN |
-  EOP_TC_NC_ACTION_EN) :
- (EOP_TCL1_ACTION_EN |
-  EOP_TC_ACTION_EN |
-  EOP_TC_WB_ACTION_EN |
-  EOP_TC_MD_ACTION_EN)) |
-EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
-EVENT_INDEX(5)));
+
+   if (writeback) {
+   dw2 = EOP_TC_WB_ACTION_EN | EOP_TC_NC_ACTION_EN;
+   } else {
+   dw2 = EOP_TCL1_ACTION_EN | EOP_TC_ACTION_EN |
+   EOP_TC_WB_ACTION_EN | EOP_TC_MD_ACTION_EN;
+   }
+   dw2 |= EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
+   

Re: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9(v3)

2022-09-09 Thread Andrey Grodzovsky

Acked-by: Andrey Grodzovsky 

Andrey

On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

Set ring functions with software ring callbacks
on gfx9.

The software ring could be tested by debugfs_test_ib
case.

v2: set sw_ring 2 to enable software ring by default.
v3: remove the parameter for software ring enablement.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h  |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  16 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 116 +--
  5 files changed, 128 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 96d058c4cd4b..525df0b4d55f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -207,6 +207,7 @@ extern bool amdgpu_ignore_bad_page_threshold;
  extern struct amdgpu_watchdog_timer amdgpu_watchdog_timer;
  extern int amdgpu_async_gfx_ring;
  extern int amdgpu_mcbp;
+extern int amdgpu_sw_ring;
  extern int amdgpu_discovery;
  extern int amdgpu_mes;
  extern int amdgpu_mes_kiq;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 0de8e3cd0f1c..5eec82014f0a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -348,6 +348,8 @@ struct amdgpu_gfx {
  
  	boolis_poweron;
  
+	/*software ring*/

+   unsigned
num_sw_gfx_rings;
struct amdgpu_ring_mux  muxer;
  };
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

index 13db99d653bd..5b70a2c36d81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -33,6 +33,7 @@
  
  #include 

  #include "amdgpu.h"
+#include "amdgpu_sw_ring.h"
  #include "atom.h"
  
  /*

@@ -121,6 +122,11 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
  {
uint32_t count;
  
+	if (ring->is_sw_ring) {

+   amdgpu_sw_ring_commit(ring);
+   return;
+   }
+
/* We pad to match fetch size */
count = ring->funcs->align_mask + 1 -
(ring->wptr & ring->funcs->align_mask);
@@ -183,6 +189,11 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
u32 *num_sched;
u32 hw_ip;
  
+	if (adev->gfx.num_sw_gfx_rings > 0 && ring->is_sw_ring) {

+   return amdgpu_sw_ring_init(adev, ring, max_dw, irq_src, 
irq_type,
+   hw_prio, sched_score);
+   }
+
/* Set the hw submission limit higher for KIQ because
 * it's used for a number of gfx/compute tasks by both
 * KFD and KGD which may have outstanding fences and
@@ -343,7 +354,10 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
   */
  void amdgpu_ring_fini(struct amdgpu_ring *ring)
  {
-
+   if (ring->is_sw_ring) {
+   amdgpu_sw_ring_fini(ring);
+   return;
+   }
/* Not to finish a ring which is not initialized */
if (!(ring->adev) ||
(!ring->is_mes_queue && !(ring->adev->rings[ring->idx])))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index fe33a683bfba..ba6d8c753f7e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -38,7 +38,8 @@ struct amdgpu_vm;
  /* max number of rings */
  #define AMDGPU_MAX_RINGS  28
  #define AMDGPU_MAX_HWIP_RINGS 8
-#define AMDGPU_MAX_GFX_RINGS   2
+/*2 software ring and 1 real ring*/
+#define AMDGPU_MAX_GFX_RINGS   3
  #define AMDGPU_MAX_COMPUTE_RINGS  8
  #define AMDGPU_MAX_VCE_RINGS  3
  #define AMDGPU_MAX_UVD_ENC_RINGS  2
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 5349ca4d19e3..774e44e1074a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -47,6 +47,7 @@
  
  #include "amdgpu_ras.h"
  
+#include "amdgpu_sw_ring.h"

  #include "gfx_v9_4.h"
  #include "gfx_v9_0.h"
  #include "gfx_v9_4_2.h"
@@ -55,7 +56,8 @@
  #include "asic_reg/pwr/pwr_10_0_sh_mask.h"
  #include "asic_reg/gc/gc_9_0_default.h"
  
-#define GFX9_NUM_GFX_RINGS 1

+#define GFX9_NUM_GFX_RINGS 3
+#define GFX9_NUM_SW_GFX_RINGS  2
  #define GFX9_MEC_HPD_SIZE 4096
  #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L
  #define RLC_SAVE_RESTORE_ADDR_STARTING_OFFSET 0xL
@@ -2270,6 +2272,7 @@ static int gfx_v9_0_compute_ring_init(struct 
amdgpu_device *adev, int ring_id,
  static int gfx

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-09 Thread Andrey Grodzovsky



On 2022-09-08 21:50, jiadong@amd.com wrote:

From: "Jiadong.Zhu" 

The software ring is created to support priority
context while there is only one hardware queue
for gfx.

Every software rings has its fence driver and could
be used as an ordinary ring for the gpu_scheduler.
Multiple software rings are binded to a real ring
with the ring muxer. The packages committed on the
software ring are copied to the real ring.

v2: use array to store software ring entry.
v3: remove unnecessary prints.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
  7 files changed, 509 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 3e0e2eb7e235..85224bc81ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+   amdgpu_sw_ring.o amdgpu_ring_mux.o
  
  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

index 53526ffb2ce1..0de8e3cd0f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
  #include "amdgpu_imu.h"
  #include "soc15.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
  
  /* GFX current status */

  #define AMDGPU_GFX_NORMAL_MODE0xL
@@ -346,6 +347,8 @@ struct amdgpu_gfx {
struct amdgpu_gfx_ras   *ras;
  
  	boolis_poweron;

+
+   struct amdgpu_ring_mux  muxer;
  };
  
  #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev))

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7d89a52091c0..fe33a683bfba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,9 @@ struct amdgpu_ring {
boolis_mes_queue;
uint32_thw_queue_id;
struct amdgpu_mes_ctx_data *mes_ctx;
+
+   boolis_sw_ring;
+
  };
  
  #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib)))

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
new file mode 100644
index ..ea4a3c66119a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "amdgpu_ring_mux.h"
+#include "amdgpu_ring.h"
+
+#define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ/2)
+
+static int copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux, struct 
amdgpu_ring *ring,
+   u64 s_begin, u64 s_end);
+
+int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct 

Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
Please send everything together because otherwise it's not clear why we 
need this.


Andrey

On 2022-09-08 11:09, James Zhu wrote:

Yes, it is for NPI design. I will send out patches for review soon.

Thanks!

James

On 2022-09-08 11:05 a.m., Andrey Grodzovsky wrote:
So this is the real need of this patch-set, but this explanation 
doesn't appear anywhere in the description.
It's always good to add a short 0 RFC patch which describes the 
intention of the patchset if the code is

not self explanatory.

And I still don't understand the need - i don't see anything in 
amdgpu_ctx_fini_entity regarding
rings tracking ? Is it a new code you plan to add and not included in 
this patcheset ? Did i miss an

earlier patch maybe ?

Andrey

On 2022-09-08 10:45, James Zhu wrote:

To save lines is not the purpose.

Also I want to use entity->sched_list to track ring which is used in 
this ctx in amdgpu_ctx_fini_entity


Best Regards!

James

On 2022-09-08 10:38 a.m., Andrey Grodzovsky wrote:
I guess it's an option but i don't really see what's the added 
value  ? You saved a few lines in this patch
but added a few lines in another. In total seems to me no to much 
difference ?


Andrey

On 2022-09-08 10:17, James Zhu wrote:

Hi Andrey

Basically this entire patch set are derived from patch [3/4]: 
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;


I think no special reason to treat single and multiple schedule 
list here.


Best Regards!

James

On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote:

What's the reason for this entire patch set ?

Andrey

On 2022-09-07 16:57, James Zhu wrote:

drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of
struct drm_gpu_scheduler *

Signed-off-by: James Zhu 
---
  include/drm/gpu_scheduler.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/gpu_scheduler.h 
b/include/drm/gpu_scheduler.h

index 0fca8f38bee4..011f70a43397 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct 
drm_sched_fence *fence);
  unsigned long drm_sched_suspend_timeout(struct 
drm_gpu_scheduler *sched);

  void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
  unsigned long remaining);
-struct drm_gpu_scheduler *
+struct drm_gpu_scheduler **
  drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
   unsigned int num_sched_list);


Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
So this is the real need of this patch-set, but this explanation doesn't 
appear anywhere in the description.
It's always good to add a short 0 RFC patch which describes the 
intention of the patchset if the code is

not self explanatory.

And I still don't understand the need - i don't see anything in 
amdgpu_ctx_fini_entity regarding
rings tracking ? Is it a new code you plan to add and not included in 
this patcheset ? Did i miss an

earlier patch maybe ?

Andrey

On 2022-09-08 10:45, James Zhu wrote:

To save lines is not the purpose.

Also I want to use entity->sched_list to track ring which is used in 
this ctx in amdgpu_ctx_fini_entity


Best Regards!

James

On 2022-09-08 10:38 a.m., Andrey Grodzovsky wrote:
I guess it's an option but i don't really see what's the added value  
? You saved a few lines in this patch
but added a few lines in another. In total seems to me no to much 
difference ?


Andrey

On 2022-09-08 10:17, James Zhu wrote:

Hi Andrey

Basically this entire patch set are derived from patch [3/4]: 
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;


I think no special reason to treat single and multiple schedule list 
here.


Best Regards!

James

On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote:

What's the reason for this entire patch set ?

Andrey

On 2022-09-07 16:57, James Zhu wrote:

drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of
struct drm_gpu_scheduler *

Signed-off-by: James Zhu 
---
  include/drm/gpu_scheduler.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/gpu_scheduler.h 
b/include/drm/gpu_scheduler.h

index 0fca8f38bee4..011f70a43397 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct 
drm_sched_fence *fence);
  unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler 
*sched);

  void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
  unsigned long remaining);
-struct drm_gpu_scheduler *
+struct drm_gpu_scheduler **
  drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
   unsigned int num_sched_list);


Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
I guess it's an option but i don't really see what's the added value  ? 
You saved a few lines in this patch
but added a few lines in another. In total seems to me no to much 
difference ?


Andrey

On 2022-09-08 10:17, James Zhu wrote:

Hi Andrey

Basically this entire patch set are derived from patch [3/4]: 
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;


I think no special reason to treat single and multiple schedule list 
here.


Best Regards!

James

On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote:

What's the reason for this entire patch set ?

Andrey

On 2022-09-07 16:57, James Zhu wrote:

drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of
struct drm_gpu_scheduler *

Signed-off-by: James Zhu 
---
  include/drm/gpu_scheduler.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 0fca8f38bee4..011f70a43397 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct 
drm_sched_fence *fence);
  unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler 
*sched);

  void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
  unsigned long remaining);
-struct drm_gpu_scheduler *
+struct drm_gpu_scheduler **
  drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
   unsigned int num_sched_list);


Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky

What's the reason for this entire patch set ?

Andrey

On 2022-09-07 16:57, James Zhu wrote:

drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of
struct drm_gpu_scheduler *

Signed-off-by: James Zhu 
---
  include/drm/gpu_scheduler.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 0fca8f38bee4..011f70a43397 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct drm_sched_fence 
*fence);
  unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched);
  void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
unsigned long remaining);
-struct drm_gpu_scheduler *
+struct drm_gpu_scheduler **
  drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 unsigned int num_sched_list);
  


Re: [PATCH v2] drm/sced: Add FIFO sched policy to rq

2022-09-07 Thread Andrey Grodzovsky

Luben, just a ping, whenever you have time.

Andrey

On 2022-09-05 01:57, Christian König wrote:



Am 03.09.22 um 04:48 schrieb Andrey Grodzovsky:

Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entites based on TS of
oldest job waiting in job queue of enitity. Improves next
enitity extraction to O(1). Enitity TS update
O(log(number of entites in rq))

Drop default option in module control parameter.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 

[SNIP]

  /**
@@ -313,6 +330,14 @@ struct drm_sched_job {
    /** @last_dependency: tracks @dependencies as they signal */
  unsigned long    last_dependency;
+
+
+    /**
+    * @submit_ts:
+    *
+    * Marks job submit time


Maybe write something like "When the job was pushed into the entity 
queue."


Apart from that I leave it to Luben and you to get this stuff upstream.

Thanks,
Christian.


+    */
+    ktime_t submit_ts;
  };
    static inline bool drm_sched_invalidate_job(struct drm_sched_job 
*s_job,
@@ -501,6 +526,10 @@ void drm_sched_rq_add_entity(struct drm_sched_rq 
*rq,

  void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
  struct drm_sched_entity *entity);
  +void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, 
ktime_t ts,

+  bool remove_only);
+
+
  int drm_sched_entity_init(struct drm_sched_entity *entity,
    enum drm_sched_priority priority,
    struct drm_gpu_scheduler **sched_list,




[PATCH v2] drm/sced: Add FIFO sched policy to rq

2022-09-02 Thread Andrey Grodzovsky
Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entites based on TS of
oldest job waiting in job queue of enitity. Improves next
enitity extraction to O(1). Enitity TS update
O(log(number of entites in rq))

Drop default option in module control parameter.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  29 -
 drivers/gpu/drm/scheduler/sched_main.c   | 131 ++-
 include/drm/gpu_scheduler.h  |  29 +
 3 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 191c56064f19..65ae4be2248b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -33,6 +33,8 @@
 #define to_drm_sched_job(sched_job)\
container_of((sched_job), struct drm_sched_job, queue_node)
 
+extern int drm_sched_policy;
+
 /**
  * drm_sched_entity_init - Init a context entity used by scheduler when
  * submit to HW ring.
@@ -73,6 +75,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
 
if(num_sched_list)
entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +420,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
if (!sched_job)
-   return NULL;
+   goto skip;
 
while ((entity->dependency =
drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
-   if (drm_sched_entity_add_dependency_cb(entity))
-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
 
/* skip jobs from entity that marked guilty */
@@ -443,6 +448,17 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(>job_queue);
+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == 1)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()),
+false);
+
return sched_job;
 }
 
@@ -502,11 +518,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +536,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == 1)
+   drm_sched_rq_update_fifo(entity, ts,  false);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index c5437ee03e3f..4d2450b3f5bd 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drive

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-25 Thread Andrey Grodzovsky



On 2022-08-24 22:29, Luben Tuikov wrote:

Inlined:

On 2022-08-24 12:21, Andrey Grodzovsky wrote:

On 2022-08-23 17:37, Luben Tuikov wrote:

On 2022-08-23 14:57, Andrey Grodzovsky wrote:

On 2022-08-23 14:30, Luben Tuikov wrote:


On 2022-08-23 14:13, Andrey Grodzovsky wrote:

On 2022-08-23 12:58, Luben Tuikov wrote:

Inlined:

On 2022-08-22 16:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on

^Problem


same scheduler an uncceptabliy long wait time for some

^unacceptably


jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some

^queues


jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job

^entities; smaller


might execute ealier even though that job arrived later

^earlier


then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  2 +
 drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
 include/drm/gpu_scheduler.h  |  8 +++
 3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
+
 
 	/* first job wakes up scheduler */

if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
 
+

+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+   "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);

As per Christian's comments, you can drop the "auto" and perhaps leave one as 
the default,
say the RR.

I do think it is beneficial to have a module parameter control the scheduling 
policy, as shown above.

Christian is not against it, just against adding 'auto' here - like the
default.

Exactly what I said.

Also, I still think an O(1) scheduling (picking next to run) should be
what we strive for in such a FIFO patch implementation.
A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
element.

Regards,
Luben

The only solution i see for this now is keeping a global per rq jobs
list parallel to SPCP queue per entity - we use this list when we switch
to FIFO scheduling, we can even start building  it ONLY when we switch
to FIFO building it gradually as more jobs come. Do you have other solution
in mind ?

The idea is to "sort" on insertion, not on picking the next one to run.

cont'd below:


Andrey


+
+
 #define to_drm_sched_job(sched_job)\
container_of((sched_job), struct drm_sched_job, queue_node)
 
@@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,

 }
 
 /**

- * drm_sched_rq_select_entity - Select an entity which could provide a job to 
run
+ * drm_sched_rq_select_entity_rr - Select an entity which could provide a job 
to run
  *
  * @rq: scheduler run queue to check.
  *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 {
struct drm_sched_entity *entity;
 
@@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)

return NULL;
 }
 
+/**

+ * drm_sched_rq_select_entity_fifo - Select an entity which could provide a 
job to run
+ *
+ * @rq: scheduler run queue t

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-25 Thread Andrey Grodzovsky



On 2022-08-24 22:29, Luben Tuikov wrote:

Inlined:

On 2022-08-24 12:21, Andrey Grodzovsky wrote:

On 2022-08-23 17:37, Luben Tuikov wrote:

On 2022-08-23 14:57, Andrey Grodzovsky wrote:

On 2022-08-23 14:30, Luben Tuikov wrote:


On 2022-08-23 14:13, Andrey Grodzovsky wrote:

On 2022-08-23 12:58, Luben Tuikov wrote:

Inlined:

On 2022-08-22 16:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on

^Problem


same scheduler an uncceptabliy long wait time for some

^unacceptably


jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some

^queues


jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job

^entities; smaller


might execute ealier even though that job arrived later

^earlier


then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  2 +
 drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
 include/drm/gpu_scheduler.h  |  8 +++
 3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
+
 
 	/* first job wakes up scheduler */

if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
 
+

+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+   "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);

As per Christian's comments, you can drop the "auto" and perhaps leave one as 
the default,
say the RR.

I do think it is beneficial to have a module parameter control the scheduling 
policy, as shown above.

Christian is not against it, just against adding 'auto' here - like the
default.

Exactly what I said.

Also, I still think an O(1) scheduling (picking next to run) should be
what we strive for in such a FIFO patch implementation.
A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
element.

Regards,
Luben

The only solution i see for this now is keeping a global per rq jobs
list parallel to SPCP queue per entity - we use this list when we switch
to FIFO scheduling, we can even start building  it ONLY when we switch
to FIFO building it gradually as more jobs come. Do you have other solution
in mind ?

The idea is to "sort" on insertion, not on picking the next one to run.

cont'd below:


Andrey


+
+
 #define to_drm_sched_job(sched_job)\
container_of((sched_job), struct drm_sched_job, queue_node)
 
@@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,

 }
 
 /**

- * drm_sched_rq_select_entity - Select an entity which could provide a job to 
run
+ * drm_sched_rq_select_entity_rr - Select an entity which could provide a job 
to run
  *
  * @rq: scheduler run queue to check.
  *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 {
struct drm_sched_entity *entity;
 
@@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)

return NULL;
 }
 
+/**

+ * drm_sched_rq_select_entity_fifo - Select an entity which could provide a 
job to run
+ *
+ * @rq: scheduler run queue t

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-25 Thread Andrey Grodzovsky


On 2022-08-23 17:37, Luben Tuikov wrote:


On 2022-08-23 14:57, Andrey Grodzovsky wrote:

On 2022-08-23 14:30, Luben Tuikov wrote:


On 2022-08-23 14:13, Andrey Grodzovsky wrote:

On 2022-08-23 12:58, Luben Tuikov wrote:

Inlined:

On 2022-08-22 16:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on

^Problem


same scheduler an uncceptabliy long wait time for some

^unacceptably


jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some

^queues


jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job

^entities; smaller


might execute ealier even though that job arrived later

^earlier


then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
drivers/gpu/drm/scheduler/sched_entity.c |  2 +
drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
include/drm/gpu_scheduler.h  |  8 +++
3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
+

	/* first job wakes up scheduler */

if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
#define CREATE_TRACE_POINTS
#include "gpu_scheduler_trace.h"

+

+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+   "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);

As per Christian's comments, you can drop the "auto" and perhaps leave one as 
the default,
say the RR.

I do think it is beneficial to have a module parameter control the scheduling 
policy, as shown above.

Christian is not against it, just against adding 'auto' here - like the
default.

Exactly what I said.

Also, I still think an O(1) scheduling (picking next to run) should be
what we strive for in such a FIFO patch implementation.
A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
element.

Regards,
Luben


The only solution i see for this now is keeping a global per rq jobs
list parallel to SPCP queue per entity - we use this list when we switch
to FIFO scheduling, we can even start building  it ONLY when we switch
to FIFO building it gradually as more jobs come. Do you have other solution
in mind ?

The idea is to "sort" on insertion, not on picking the next one to run.

cont'd below:


Andrey


+
+
#define to_drm_sched_job(sched_job) \
container_of((sched_job), struct drm_sched_job, queue_node)

@@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,

}

/**

- * drm_sched_rq_select_entity - Select an entity which could provide a job to 
run
+ * drm_sched_rq_select_entity_rr - Select an entity which could provide a job 
to run
 *
 * @rq: scheduler run queue to check.
 *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
 */
static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
{
struct drm_sched_entity *entity;

@@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)

return NULL;
}

+/**

+ * drm_sched_rq_select_entity_fifo - Select an entity which could provide a 
job to run
+ *
+ * @rq: scheduler run queue to check.
+ *
+ * Try to find a ready entity, based on FIFO order of jobs arrivals.
+ *
+ * Returns NULL if none found.
+ */
+s

Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled

2022-08-24 Thread Andrey Grodzovsky



On 2022-08-17 10:01, Andrey Grodzovsky wrote:


On 2022-08-17 09:44, Alex Deucher wrote:
On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas  
wrote:

[AMD Official Use Only - General]

Hi Alex:
   When removing an amdgpu device, it may be difficult to change the 
order of psp_hw_fini calls.


1. The drm_dev_unplug call is at the beginning in the 
amdgpu_pci_remove function,  which makes the gpu device inaccessible 
for userspace operations.  If the call to psp_hw_fini was moved 
before drm_dev_unplug,  userspace could access the gpu device but 
the psp might be removing. It has unknown issues.



+Andrey Grodzovsky

We should fix the ordering in amdgpu_pci_remove() then I guess? There
are lots of places where drm_dev_enter() is used to protect access to
the hardware which could be similarly affected.

Alex



We probably can try to move drm_dev_unplug after 
amdgpu_driver_unload_kms. I don't remember now why drm_dev_unplug must 
be the first thing we do in amdgpu_pci_remove and what impact it will 
have but maybe give it a try.
Also see if you can run libdrm hotplug test suite before and after the 
change.


Andrey



Thinking a bit more about this - i guess the main problem with this will 
be that in case of real hot unplug (which is hard to test unless you 
have a real GPU cage with extenal GPU) this move will cause trying to 
accesses HW registers
or MMIO ranges from VRAM BOs when HW is missing when you try to shut 
down the HW in HW fini IP block specific callbacks , this in turn will 
return garbage for reads (or all 1s maybe) which is what we probably 
were trying to avoid by putting drm_dev_unplug as the first thing. So 
it's probably a bit problematic.


Andrey







2. psp_hw_fini is called by the .hw_fini iterator in 
amdgpu_device_ip_fini_early, referring to the code starting from 
amdgpu_pci_remove to .hw_fini is called,
    there are many preparatory operations before calling .hw_fini,  
which makes it very difficult to change the order of psp_hw_fini or 
all block .hw_fini.


    So can we do a workaround in psp_cmd_submit_buf when removing 
amdgpu device?


-Original Message-
From: Alex Deucher 
Sent: Monday, August 15, 2022 10:22 PM
To: Chai, Thomas 
Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
; Chen, Guchun ; Chai, 
Thomas 
Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually 
sent to psp when amdgpu is uninstalled


On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai  
wrote:

The psp_cmd_submit_buf function is called by psp_hw_fini to send TA
unload messages to psp to terminate ras, asd and tmr.
But when amdgpu is uninstalled, drm_dev_unplug is called earlier than
psp_hw_fini in amdgpu_pci_remove, the calling order as follows:
static void amdgpu_pci_remove(struct pci_dev *pdev) {
 drm_dev_unplug
 ..
amdgpu_driver_unload_kms->amdgpu_device_fini_hw->...
 ->.hw_fini->psp_hw_fini->...
 ->psp_ta_unload->psp_cmd_submit_buf
 ..
}
The program will return when calling drm_dev_enter in
psp_cmd_submit_buf.

So the call to drm_dev_enter in psp_cmd_submit_buf should be removed,
so that the TA unload messages can be sent to the psp when amdgpu is
uninstalled.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index b067ce45d226..0578d8d094a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
 if (psp->adev->no_hw_access)
 return 0;

-   if (!drm_dev_enter(adev_to_drm(psp->adev), ))
-   return 0;
-
This check is to prevent the hardware from being accessed if the 
card is removed.  I think we need to fix the ordering elsewhere.


Alex


 memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);

 memcpy(psp->cmd_buf_mem, cmd, sizeof(struct
psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct 
psp_context *psp,

 }

  exit:
-   drm_dev_exit(idx);
 return ret;
  }

--
2.25.1



Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-24 Thread Andrey Grodzovsky

On 2022-08-24 04:29, Michel Dänzer wrote:


On 2022-08-22 22:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Instead of ordering based on when jobs are added, might it be possible to order 
them based on when they become ready to run?

Otherwise it seems possible to e.g. submit a large number of inter-dependent 
jobs at once, and they would all run before any jobs from another queue get a 
chance.



While any of them is not ready (i.e. still having unfulfilled 
dependency) this job will not be chosen to run (see 
drm_sched_entity_is_ready). In this scenario if an earlier job
from entity E1 is not ready to run it will be skipped and a later job 
from entity E2 (which is ready) will be chosen to run  so E1 job is not 
blocking E2 job. The moment E1 job
does become ready it seems to me logical to let it run ASAP as it's by 
now it spent the most time of anyone waiting for execution, and I don't 
think it matters that part of this time

was because it waited for dependency job to complete it's run.

Andrey







Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-23 Thread Andrey Grodzovsky

On 2022-08-23 14:30, Luben Tuikov wrote:



On 2022-08-23 14:13, Andrey Grodzovsky wrote:

On 2022-08-23 12:58, Luben Tuikov wrote:

Inlined:

On 2022-08-22 16:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on

^Problem


same scheduler an uncceptabliy long wait time for some

^unacceptably


jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some

^queues


jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job

^entities; smaller


might execute ealier even though that job arrived later

^earlier


then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
   drivers/gpu/drm/scheduler/sched_entity.c |  2 +
   drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
   include/drm/gpu_scheduler.h  |  8 +++
   3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
+
   
   	/* first job wakes up scheduler */

if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
   #define CREATE_TRACE_POINTS
   #include "gpu_scheduler_trace.h"
   
+

+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+   "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);

As per Christian's comments, you can drop the "auto" and perhaps leave one as 
the default,
say the RR.

I do think it is beneficial to have a module parameter control the scheduling 
policy, as shown above.


Christian is not against it, just against adding 'auto' here - like the
default.

Exactly what I said.

Also, I still think an O(1) scheduling (picking next to run) should be
what we strive for in such a FIFO patch implementation.
A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
element.

Regards,
Luben



The only solution i see for this now is keeping a global per rq jobs 
list parallel to SPCP queue per entity - we use this list when we switch
to FIFO scheduling, we can even start building  it ONLY when we switch 
to FIFO building it gradually as more jobs come. Do you have other solution

in mind ?

Andrey






+
+
   #define to_drm_sched_job(sched_job)  \
container_of((sched_job), struct drm_sched_job, queue_node)
   
@@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,

   }
   
   /**

- * drm_sched_rq_select_entity - Select an entity which could provide a job to 
run
+ * drm_sched_rq_select_entity_rr - Select an entity which could provide a job 
to run
*
* @rq: scheduler run queue to check.
*
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
*/
   static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
   {
struct drm_sched_entity *entity;
   
@@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)

return NULL;
   }
   
+/**

+ * drm_sched_rq_select_entity_fifo - Select an entity which could provide a 
job to run
+ *
+ * @rq: scheduler run queue to check.
+ *
+ * Try to find a ready entity, based on FIFO order of jobs arrivals.
+ *
+ * Returns NULL if none found.
+ */
+static struct drm_sched_entity *
+drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+{
+   struct drm_sched_entity *tmp, *entity = NULL;
+   ktime_t oldest_ts = KTIME_MAX;
+   stru

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-23 Thread Andrey Grodzovsky



On 2022-08-23 12:58, Luben Tuikov wrote:

Inlined:

On 2022-08-22 16:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on

^Problem


same scheduler an uncceptabliy long wait time for some

^unacceptably


jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some

^queues


jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job

^entities; smaller


might execute ealier even though that job arrived later

^earlier


then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  2 +
  drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
  include/drm/gpu_scheduler.h  |  8 +++
  3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
+
  
  	/* first job wakes up scheduler */

if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
  #define CREATE_TRACE_POINTS
  #include "gpu_scheduler_trace.h"
  
+

+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+   "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);

As per Christian's comments, you can drop the "auto" and perhaps leave one as 
the default,
say the RR.

I do think it is beneficial to have a module parameter control the scheduling 
policy, as shown above.



Christian is not against it, just against adding 'auto' here - like the 
default.






+
+
  #define to_drm_sched_job(sched_job)   \
container_of((sched_job), struct drm_sched_job, queue_node)
  
@@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,

  }
  
  /**

- * drm_sched_rq_select_entity - Select an entity which could provide a job to 
run
+ * drm_sched_rq_select_entity_rr - Select an entity which could provide a job 
to run
   *
   * @rq: scheduler run queue to check.
   *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
   */
  static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
  {
struct drm_sched_entity *entity;
  
@@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)

return NULL;
  }
  
+/**

+ * drm_sched_rq_select_entity_fifo - Select an entity which could provide a 
job to run
+ *
+ * @rq: scheduler run queue to check.
+ *
+ * Try to find a ready entity, based on FIFO order of jobs arrivals.
+ *
+ * Returns NULL if none found.
+ */
+static struct drm_sched_entity *
+drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+{
+   struct drm_sched_entity *tmp, *entity = NULL;
+   ktime_t oldest_ts = KTIME_MAX;
+   struct drm_sched_job *sched_job;
+
+   spin_lock(>lock);
+
+   list_for_each_entry(tmp, >entities, list) {
+
+   if (drm_sched_entity_is_ready(tmp)) {
+   sched_job = 
to_drm_sched_job(spsc_queue_peek(>job_queue));
+
+   if (ktime_before(sched_job->submit_ts, oldest_ts)) {
+   oldest_ts = sched_job->submit_ts;
+   entity = tmp;
+   }
+   }
+   }

Here I think we need an O(1) lookup of the next job to pick out to run.
I see a number of optimizations, for instance keeping the current/oldest
timestamp in the rq

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-23 Thread Andrey Grodzovsky

On 2022-08-23 08:15, Christian König wrote:




Am 22.08.22 um 22:09 schrieb Andrey Grodzovsky:

Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  2 +
  drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
  include/drm/gpu_scheduler.h  |  8 +++
  3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  atomic_inc(entity->rq->sched->score);
  WRITE_ONCE(entity->last_user, current->group_leader);
  first = spsc_queue_push(>job_queue, 
_job->queue_node);

+    sched_job->submit_ts = ktime_get();
+
    /* first job wakes up scheduler */
  if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
  #define CREATE_TRACE_POINTS
  #include "gpu_scheduler_trace.h"
  +
+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+    "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");


Well we don't really have an autodetect at the moment, so I would drop 
that.



+module_param_named(sched_policy, drm_sched_policy, int, 0444);
+
+
  #define to_drm_sched_job(sched_job)    \
  container_of((sched_job), struct drm_sched_job, queue_node)
  @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct 
drm_sched_rq *rq,

  }
    /**
- * drm_sched_rq_select_entity - Select an entity which could provide 
a job to run
+ * drm_sched_rq_select_entity_rr - Select an entity which could 
provide a job to run

   *
   * @rq: scheduler run queue to check.
   *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
   */
  static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
  {
  struct drm_sched_entity *entity;
  @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq 
*rq)

  return NULL;
  }
  +/**
+ * drm_sched_rq_select_entity_fifo - Select an entity which could 
provide a job to run

+ *
+ * @rq: scheduler run queue to check.
+ *
+ * Try to find a ready entity, based on FIFO order of jobs arrivals.
+ *
+ * Returns NULL if none found.
+ */
+static struct drm_sched_entity *
+drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+{
+    struct drm_sched_entity *tmp, *entity = NULL;
+    ktime_t oldest_ts = KTIME_MAX;
+    struct drm_sched_job *sched_job;
+
+    spin_lock(>lock);
+
+    list_for_each_entry(tmp, >entities, list) {
+
+    if (drm_sched_entity_is_ready(tmp)) {
+    sched_job = 
to_drm_sched_job(spsc_queue_peek(>job_queue));

+
+    if (ktime_before(sched_job->submit_ts, oldest_ts)) {
+    oldest_ts = sched_job->submit_ts;
+    entity = tmp;
+    }
+    }
+    }
+
+    if (entity) {
+    rq->current_entity = entity;
+    reinit_completion(>entity_idle);
+    }


That should probably be a separate function or at least outside of 
this here.


Apart from that totally straight forward implementation. Any idea how 
much extra overhead that is?


Regards,
Christian.



Well, memory wise you have the extra long for each job struct for the 
time stamp, and then for each next job extraction you have to iterate 
the entire rq to find the next entity with oldest job so always linear 
in number of entitles. Today the worst case is al

[PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-22 Thread Andrey Grodzovsky
Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Li Yunxiang (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  2 +
 drivers/gpu/drm/scheduler/sched_main.c   | 65 ++--
 include/drm/gpu_scheduler.h  |  8 +++
 3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..3bb7f69306ef 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ktime_get();
+
 
/* first job wakes up scheduler */
if (first) {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..c123aa120d06 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -59,6 +59,19 @@
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
 
+
+
+int drm_sched_policy = -1;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entites scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy,
+   "specify schedule policy for entites on a runqueue (-1 = 
auto(default) value, 0 = Round Robin,1  = use FIFO");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);
+
+
 #define to_drm_sched_job(sched_job)\
container_of((sched_job), struct drm_sched_job, queue_node)
 
@@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 }
 
 /**
- * drm_sched_rq_select_entity - Select an entity which could provide a job to 
run
+ * drm_sched_rq_select_entity_rr - Select an entity which could provide a job 
to run
  *
  * @rq: scheduler run queue to check.
  *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find a ready entity, in round robin manner.
+ *
+ * Returns NULL if none found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 {
struct drm_sched_entity *entity;
 
@@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
return NULL;
 }
 
+/**
+ * drm_sched_rq_select_entity_fifo - Select an entity which could provide a 
job to run
+ *
+ * @rq: scheduler run queue to check.
+ *
+ * Try to find a ready entity, based on FIFO order of jobs arrivals.
+ *
+ * Returns NULL if none found.
+ */
+static struct drm_sched_entity *
+drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+{
+   struct drm_sched_entity *tmp, *entity = NULL;
+   ktime_t oldest_ts = KTIME_MAX;
+   struct drm_sched_job *sched_job;
+
+   spin_lock(>lock);
+
+   list_for_each_entry(tmp, >entities, list) {
+
+   if (drm_sched_entity_is_ready(tmp)) {
+   sched_job = 
to_drm_sched_job(spsc_queue_peek(>job_queue));
+
+   if (ktime_before(sched_job->submit_ts, oldest_ts)) {
+   oldest_ts = sched_job->submit_ts;
+   entity = tmp;
+   }
+   }
+   }
+
+   if (entity) {
+   rq->current_entity = entity;
+   reinit_completion(>entity_idle);
+   }
+
+   spin_unlock(>lock);
+   return entity;
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 
/* Kernel run queue has higher priority than normal run queue*/
for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
i--) {
-   entity = drm_sched_rq_select_entity(>sched_rq[i]);
+   entity = drm_sched_policy != 1 ?

Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled

2022-08-17 Thread Andrey Grodzovsky



On 2022-08-17 09:44, Alex Deucher wrote:

On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas  wrote:

[AMD Official Use Only - General]

Hi Alex:
   When removing an amdgpu device, it may be difficult to change the order of 
psp_hw_fini calls.

1. The drm_dev_unplug call is at the beginning in the amdgpu_pci_remove 
function,  which makes the gpu device inaccessible for userspace operations.  
If the call to psp_hw_fini was moved before drm_dev_unplug,  userspace could 
access the gpu device but the psp might be removing. It has unknown issues.


+Andrey Grodzovsky

We should fix the ordering in amdgpu_pci_remove() then I guess?  There
are lots of places where drm_dev_enter() is used to protect access to
the hardware which could be similarly affected.

Alex



We probably can try to move drm_dev_unplug after 
amdgpu_driver_unload_kms. I don't remember now why drm_dev_unplug must 
be the first thing we do in amdgpu_pci_remove and what impact it will 
have but maybe give it a try.
Also see if you can run libdrm hotplug test suite before and after the 
change.


Andrey





2. psp_hw_fini is called by the .hw_fini iterator in 
amdgpu_device_ip_fini_early, referring to the code starting from 
amdgpu_pci_remove to .hw_fini is called,
there are many preparatory operations before calling .hw_fini,  which makes 
it very difficult to change the order of psp_hw_fini or all block .hw_fini.

So can we do a workaround in psp_cmd_submit_buf when removing amdgpu device?

-Original Message-
From: Alex Deucher 
Sent: Monday, August 15, 2022 10:22 PM
To: Chai, Thomas 
Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Chen, Guchun 
; Chai, Thomas 
Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to 
psp when amdgpu is uninstalled

On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai  wrote:

The psp_cmd_submit_buf function is called by psp_hw_fini to send TA
unload messages to psp to terminate ras, asd and tmr.
But when amdgpu is uninstalled, drm_dev_unplug is called earlier than
psp_hw_fini in amdgpu_pci_remove, the calling order as follows:
static void amdgpu_pci_remove(struct pci_dev *pdev) {
 drm_dev_unplug
 ..
 amdgpu_driver_unload_kms->amdgpu_device_fini_hw->...
 ->.hw_fini->psp_hw_fini->...
 ->psp_ta_unload->psp_cmd_submit_buf
 ..
}
The program will return when calling drm_dev_enter in
psp_cmd_submit_buf.

So the call to drm_dev_enter in psp_cmd_submit_buf should be removed,
so that the TA unload messages can be sent to the psp when amdgpu is
uninstalled.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index b067ce45d226..0578d8d094a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
 if (psp->adev->no_hw_access)
 return 0;

-   if (!drm_dev_enter(adev_to_drm(psp->adev), ))
-   return 0;
-

This check is to prevent the hardware from being accessed if the card is 
removed.  I think we need to fix the ordering elsewhere.

Alex


 memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);

 memcpy(psp->cmd_buf_mem, cmd, sizeof(struct
psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct psp_context 
*psp,
 }

  exit:
-   drm_dev_exit(idx);
 return ret;
  }

--
2.25.1



Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-12 Thread Andrey Grodzovsky



On 2022-08-12 14:38, Kim, Jonathan wrote:

[Public]

Hi Andrey,

Here's the load/unload stack trace.  This is a 2 GPU xGMI system.  I put 
dbg_xgmi_hive_get/put refcount print post kobj get/put.
It's stuck at 2 on unload.  If it's an 8 GPU system, it's stuck at 8.

e.g. of sysfs leak after driver unload:
atitest@atitest:/sys/devices/pci:80/:80:02.0/:81:00.0/:82:00.0/:83:00.0$
 ls xgmi_hive_info/
xgmi_hive_id

Thanks,

Jon



I see the leak, but how is it related to amdgpu_reset_domain ? How you 
think that he causing this ?


Andrey





Driver load (get ref happens on both device add to hive and init per device):
[   61.975900] amdkcl: loading out-of-tree module taints kernel.
[   61.975973] amdkcl: module verification failed: signature and/or required 
key missing - tainting kernel
[   62.065546] amdkcl: Warning: fail to get symbol cancel_work, replace it with 
kcl stub
[   62.081920] AMD-Vi: AMD IOMMUv2 functionality not available on this system - 
This is not a bug.
[   62.491119] [drm] amdgpu kernel modesetting enabled.
[   62.491122] [drm] amdgpu version: 5.18.2
[   62.491124] [drm] OS DRM version: 5.15.0
[   62.491337] amdgpu: CRAT table not found
[   62.491341] amdgpu: Virtual CRAT table created for CPU
[   62.491360] amdgpu: Topology: Add CPU node
[   62.603556] amdgpu: PeerDirect support was initialized successfully
[   62.603847] amdgpu :83:00.0: enabling device (0100 -> 0102)
[   62.603987] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 
0x1002:0x0834 0x00).
[   62.604023] [drm] register mmio base: 0xFBD0
[   62.604026] [drm] register mmio size: 524288
[   62.604171] [drm] add ip block number 0 
[   62.604175] [drm] add ip block number 1 
[   62.604177] [drm] add ip block number 2 
[   62.604180] [drm] add ip block number 3 
[   62.604182] [drm] add ip block number 4 
[   62.604185] [drm] add ip block number 5 
[   62.604187] [drm] add ip block number 6 
[   62.604190] [drm] add ip block number 7 
[   62.604192] [drm] add ip block number 8 
[   62.604194] [drm] add ip block number 9 
[   62.641771] amdgpu :83:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   62.641777] amdgpu: ATOM BIOS: 113-D1630200-112
[   62.713418] [drm] UVD(0) is enabled in VM mode
[   62.713423] [drm] UVD(1) is enabled in VM mode
[   62.713426] [drm] UVD(0) ENC is enabled in VM mode
[   62.713428] [drm] UVD(1) ENC is enabled in VM mode
[   62.713430] [drm] VCE enabled in VM mode
[   62.713433] amdgpu :83:00.0: amdgpu: Trusted Memory Zone (TMZ) feature 
not supported
[   62.713472] [drm] GPU posting now...
[   62.713993] amdgpu :83:00.0: amdgpu: MEM ECC is active.
[   62.713995] amdgpu :83:00.0: amdgpu: SRAM ECC is active.
[   62.714006] amdgpu :83:00.0: amdgpu: RAS INFO: ras initialized 
successfully, hardware ability[7fff] ras_mask[7fff]
[   62.714018] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, 
fragment size is 9-bit
[   62.714026] amdgpu :83:00.0: amdgpu: VRAM: 32752M 0x0080 - 
0x0087FEFF (32752M used)
[   62.714029] amdgpu :83:00.0: amdgpu: GART: 512M 0x - 
0x1FFF
[   62.714032] amdgpu :83:00.0: amdgpu: AGP: 267845632M 0x0090 
- 0x
[   62.714043] [drm] Detected VRAM RAM=32752M, BAR=32768M
[   62.714044] [drm] RAM width 4096bits HBM
[   62.714050] debugfs: Directory 'ttm' with parent '/' already present!
[   62.714146] [drm] amdgpu: 32752M of VRAM memory ready
[   62.714149] [drm] amdgpu: 40203M of GTT memory ready.
[   62.714170] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   62.714266] [drm] PCIE GART of 512M enabled.
[   62.714267] [drm] PTB located at 0x0080
[   62.731067] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist
[   62.731075] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist
[   62.731449] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
[   62.743177] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[   62.743244] [drm] PSP loading UVD firmware
[   62.744525] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[   62.744689] [drm] PSP loading VCE firmware
[   62.896804] [drm] reserve 0x40 from 0x87fec0 for PSP TMR
[   62.979421] amdgpu :83:00.0: amdgpu: HDCP: optional hdcp ta ucode is not 
available
[   62.979427] amdgpu :83:00.0: amdgpu: DTM: optional dtm ta ucode is not 
available
[   62.979430] amdgpu :83:00.0: amdgpu: RAP: optional rap ta ucode is not 
available
[   62.979432] amdgpu :83:00.0: amdgpu: SECUREDISPLAY: securedisplay ta 
ucode is not available
[   62.982386] [drm] Display Core initialized with v3.2.196!
[   62.984514] [drm] kiq ring mec 2 pipe 1 q 0
[   63.026846] [drm] UVD and UVD ENC initialized successfully.
[   63.225760] [drm] VCE initialized successfully.
[   63.22] amdgpu: [dbg_xgmi_hive_get] ref_count 2
[   63.28] CPU: 10 PID: 397 Comm: kworker/10:2 Tainted: G   OE 
5.15.0-46-generic #49~20.04.1-Ubuntu
[   

Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-11 Thread Andrey Grodzovsky



On 2022-08-11 11:34, Kim, Jonathan wrote:

[Public]


-Original Message-
From: Kuehling, Felix 
Sent: August 11, 2022 11:19 AM
To: amd-gfx@lists.freedesktop.org; Kim, Jonathan 
Subject: Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference
leak

Am 2022-08-11 um 09:42 schrieb Jonathan Kim:

When an xgmi node is added to the hive, it takes another hive
reference for its reset domain.

This extra reference was not dropped on device removal from the
hive so drop it.

Signed-off-by: Jonathan Kim 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c

index 1b108d03e785..560bf1c98f08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -731,6 +731,9 @@ int amdgpu_xgmi_remove_device(struct

amdgpu_device *adev)

 mutex_unlock(>hive_lock);

 amdgpu_put_xgmi_hive(hive);
+   /* device is removed from the hive so remove its reset domain

reference */

+   if (adev->reset_domain && adev->reset_domain == hive-
reset_domain)
+   amdgpu_put_xgmi_hive(hive);

This is some messed up reference counting. If you need an extra
reference from the reset_domain to the hive, that should be owned by the
reset_domain and dropped when the reset_domain is destroyed. And it's
only one reference for the reset_domain, not one reference per adev in
the reset_domain.

Cc'ing Andrey.

What you're saying seems to make more sense to me, but what I got from an 
offline conversation with Andrey
was that the reset domain reference per device was intentional.
Maybe Andrey can comment here.


What you're doing here looks like every adev that's in a reset_domain of
its hive has two references to the hive. And if you're dropping the
extra reference here, it still leaves the reset_domain with a dangling
pointer to a hive that may no longer exist. So this extra reference is
kind of pointless.



reset_domain doesn't have any references to the hive, the hive has a 
reference to reset_domain




Yes.  Currently one reference is fetched from the device's lifetime on the hive 
and the other is from the
per-device reset domain.

Snippet from amdgpu_device_ip_init:
 /**
  * In case of XGMI grab extra reference for reset domain for this 
device
  */
 if (adev->gmc.xgmi.num_physical_nodes > 1) {
 if (amdgpu_xgmi_add_device(adev) == 0) { <- [JK] reference is 
fetched here



amdgpu_xgmi_add_device calls  amdgpu_get_xgmi_hive and only on the first 
time amdgpu_get_xgmi_hive is called and hive is actually allocated and 
initialized  will we proceed
to creating the reset domain either from scratch (first creation of the 
hive) or by taking reference from adev (see [1])




[1] - 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c#L394



 struct amdgpu_hive_info *hive = 
amdgpu_get_xgmi_hive(adev); <- [JK] then here again



So here I don't see how an extra reference to reset_domain is taken if 
amdgpu_get_xgmi_hive returns early since the hive already created and 
exists in the global hive container ?


Johantan - can u please show the exact flow how recount leak on 
reset_domain is happening ?


Andrey




 if (!hive->reset_domain ||
 
!amdgpu_reset_get_reset_domain(hive->reset_domain)) {
 r = -ENOENT;
 goto init_failed;
 }

 /* Drop the early temporary reset domain we created 
for device */
 amdgpu_reset_put_reset_domain(adev->reset_domain);
 adev->reset_domain = hive->reset_domain;
 }
 }

One of these never gets dropped so a leak happens.
So either the extra reference has to be dropped on device removal from the hive 
or from what you've mentioned,
the reset_domain reference fetch should be fixed to grab at the 
hive/reset_domain level.

Thanks,

Jon


Regards,
Felix



 adev->hive = NULL;

 if (atomic_dec_return(>number_devices) == 0) {


Re: [PATCH v3 1/6] drm/amdgpu: add mode2 reset for sienna_cichlid

2022-08-03 Thread Andrey Grodzovsky

Series is Acked-by: Andrey Grodzovsky 

Andrey

On 2022-08-01 00:07, Victor Zhao wrote:

To meet the requirement for multi container usecase which needs
a quicker reset and not causing VRAM lost, adding the Mode2
reset handler for sienna_cichlid.

v2: move skip mode2 flag part separately

v3: remove the use of asic_reset_res

Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c |   7 +
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c   | 296 ++
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h   |  32 ++
  .../pm/swsmu/inc/pmfw_if/smu_v11_0_7_ppsmc.h  |   4 +-
  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |   3 +-
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  54 
  7 files changed, 394 insertions(+), 4 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index c7d0cd15b5ef..7030ac2d7d2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -75,7 +75,7 @@ amdgpu-y += \
vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o 
vega10_reg_init.o \
vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o 
mxgpu_nv.o \
nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o 
soc21.o \
-   nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o
+   sienna_cichlid.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o 
lsdma_v6_0.o
  
  # add DF block

  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 32c86a0b145c..f778466bb9db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -23,6 +23,7 @@
  
  #include "amdgpu_reset.h"

  #include "aldebaran.h"
+#include "sienna_cichlid.h"
  
  int amdgpu_reset_add_handler(struct amdgpu_reset_control *reset_ctl,

 struct amdgpu_reset_handler *handler)
@@ -40,6 +41,9 @@ int amdgpu_reset_init(struct amdgpu_device *adev)
case IP_VERSION(13, 0, 2):
ret = aldebaran_reset_init(adev);
break;
+   case IP_VERSION(11, 0, 7):
+   ret = sienna_cichlid_reset_init(adev);
+   break;
default:
break;
}
@@ -55,6 +59,9 @@ int amdgpu_reset_fini(struct amdgpu_device *adev)
case IP_VERSION(13, 0, 2):
ret = aldebaran_reset_fini(adev);
break;
+   case IP_VERSION(11, 0, 7):
+   ret = sienna_cichlid_reset_fini(adev);
+   break;
default:
break;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c 
b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
new file mode 100644
index ..b61a8ddec7ef
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
@@ -0,0 +1,296 @@
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "sienna_cichlid.h"
+#include "amdgpu_reset.h"
+#include "amdgpu_amdkfd.h"
+#include "amdgpu_dpm.h"
+#include "amdgpu_job.h"
+#include "amdgpu_ring.h"
+#include "amdgpu_ras.h"
+#include "amdgpu_psp.h"
+#include "amdgpu_xgmi.h"
+
+static struct amdgpu_reset_handler *
+sienna_cichlid_get_reset_handler(struct amdgpu_reset_control *reset_ctl,
+   struct amdgpu_reset_context *reset_context)
+{
+   struct amdgpu_reset_handler *handler;
+   struct amdgpu_device *adev = (struct amdgpu_device *)reset_ctl->handle;
+
+   if (reset_context->method != AMD_RESET_METHOD_NONE) {
+

Re: [PATCH v2 1/6] drm/amdgpu: add mode2 reset for sienna_cichlid

2022-07-28 Thread Andrey Grodzovsky



On 2022-07-28 06:30, Victor Zhao wrote:

To meet the requirement for multi container usecase which needs
a quicker reset and not causing VRAM lost, adding the Mode2
reset handler for sienna_cichlid.

v2: move skip mode2 flag part separately

Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c |   7 +
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c   | 297 ++
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h   |  32 ++
  .../pm/swsmu/inc/pmfw_if/smu_v11_0_7_ppsmc.h  |   4 +-
  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |   3 +-
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  54 
  7 files changed, 395 insertions(+), 4 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index c7d0cd15b5ef..7030ac2d7d2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -75,7 +75,7 @@ amdgpu-y += \
vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o 
vega10_reg_init.o \
vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o 
mxgpu_nv.o \
nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o 
soc21.o \
-   nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o
+   sienna_cichlid.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o 
lsdma_v6_0.o
  
  # add DF block

  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 32c86a0b145c..f778466bb9db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -23,6 +23,7 @@
  
  #include "amdgpu_reset.h"

  #include "aldebaran.h"
+#include "sienna_cichlid.h"
  
  int amdgpu_reset_add_handler(struct amdgpu_reset_control *reset_ctl,

 struct amdgpu_reset_handler *handler)
@@ -40,6 +41,9 @@ int amdgpu_reset_init(struct amdgpu_device *adev)
case IP_VERSION(13, 0, 2):
ret = aldebaran_reset_init(adev);
break;
+   case IP_VERSION(11, 0, 7):
+   ret = sienna_cichlid_reset_init(adev);
+   break;
default:
break;
}
@@ -55,6 +59,9 @@ int amdgpu_reset_fini(struct amdgpu_device *adev)
case IP_VERSION(13, 0, 2):
ret = aldebaran_reset_fini(adev);
break;
+   case IP_VERSION(11, 0, 7):
+   ret = sienna_cichlid_reset_fini(adev);
+   break;
default:
break;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c 
b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
new file mode 100644
index ..0512960bed23
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
@@ -0,0 +1,297 @@
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "sienna_cichlid.h"
+#include "amdgpu_reset.h"
+#include "amdgpu_amdkfd.h"
+#include "amdgpu_dpm.h"
+#include "amdgpu_job.h"
+#include "amdgpu_ring.h"
+#include "amdgpu_ras.h"
+#include "amdgpu_psp.h"
+#include "amdgpu_xgmi.h"
+
+static struct amdgpu_reset_handler *
+sienna_cichlid_get_reset_handler(struct amdgpu_reset_control *reset_ctl,
+   struct amdgpu_reset_context *reset_context)
+{
+   struct amdgpu_reset_handler *handler;
+   struct amdgpu_device *adev = (struct amdgpu_device *)reset_ctl->handle;
+
+   if (reset_context->method != AMD_RESET_METHOD_NONE) {
+   list_for_each_entry(handler, _ctl->reset_handlers,
+handler_list) {
+   if (handler->reset_method == reset_context->method)
+   return handler;

Re: [PATCH v2 6/6] drm/amdgpu: reduce reset time

2022-07-28 Thread Andrey Grodzovsky



On 2022-07-28 06:30, Victor Zhao wrote:

In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset

v2: add a hang flag to indicate the reset comes from a job timeout,
skip ring test and cp halt wait in this case

Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |  3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h |  1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 11 +--
  5 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 222d3d7ea076..c735a17c6afb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -27,6 +27,7 @@
  #include "amdgpu_gfx.h"
  #include "amdgpu_rlc.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_reset.h"
  
  /* delay 0.1 second to enable gfx off feature */

  #define GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
@@ -477,7 +478,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i],
   RESET_QUEUES, 0, 0);
  
-	if (adev->gfx.kiq.ring.sched.ready)

+   if (adev->gfx.kiq.ring.sched.ready && !(amdgpu_in_reset(adev) && 
adev->reset_domain->hang))



I think it's enough to look at adev->reset_domain->hang and you can drop 
the amdgpu_in_reset check.



r = amdgpu_ring_test_helper(kiq_ring);
spin_unlock(>gfx.kiq.ring_lock);
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 6c3e7290153f..bb40880a557f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -49,6 +49,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
}
  
  	memset(, 0, sizeof(struct amdgpu_task_info));

+   adev->reset_domain->hang = true;
  
  	if (amdgpu_gpu_recovery &&

amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
@@ -83,6 +84,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
}
  
  exit:

+   adev->reset_domain->hang = false;
drm_dev_exit(idx);
return DRM_GPU_SCHED_STAT_NOMINAL;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 9da5ead50c90..b828fe773f50 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -155,6 +155,7 @@ struct amdgpu_reset_domain 
*amdgpu_reset_create_reset_domain(enum amdgpu_reset_d
atomic_set(_domain->in_gpu_reset, 0);
atomic_set(_domain->reset_res, 0);
init_rwsem(_domain->sem);
+   reset_domain->hang = false;
  
  	return reset_domain;

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index cc4b2eeb24cf..29e324add552 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -84,6 +84,7 @@ struct amdgpu_reset_domain {
struct rw_semaphore sem;
atomic_t in_gpu_reset;
atomic_t reset_res;
+   bool hang;
  };
  
  
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

index fafbad3cf08d..a384e04d916c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -29,6 +29,7 @@
  #include "amdgpu.h"
  #include "amdgpu_gfx.h"
  #include "amdgpu_psp.h"
+#include "amdgpu_reset.h"
  #include "nv.h"
  #include "nvd.h"
  
@@ -5971,6 +5972,9 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable)

WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp);
}
  
+	if ((amdgpu_in_reset(adev) && adev->reset_domain->hang) && !enable)

+   return 0;
+



Same as above



for (i = 0; i < adev->usec_timeout; i++) {
if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
break;
@@ -7569,8 +7573,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct 
amdgpu_device *adev)
for (i = 0; i < adev->gfx.num_gfx_rings; i++)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.gfx_ring[i],
   PREEMPT_QUEUES, 0, 0);
-
-   return amdgpu_ring_test_helper(kiq_ring);
+   if (!(amdgpu_in_reset(adev) && adev->reset_domain->hang))



Same as above

Andrey



+   return amdgpu_ring_test_helper(kiq_ring);
+   else
+   return 0;
  }
  #endif
  
@@ -7610,6 +7616,7 @@ static int gfx_v10_0_hw_fini(void *handle)
  
  		return 0;

}
+
gfx_v10_0_cp_enable(adev, false);
gfx_v10_0_enable_gui_idle_interrupt(adev, false);
  


Re: [PATCH 5/5] drm/amdgpu: reduce reset time

2022-07-27 Thread Andrey Grodzovsky

On 2022-07-27 06:35, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Andrey,

Problem with status.hang is that it is set at 
amdgpu_device_ip_check_soft_reset, which is not implemented in nv or gfx10. 
They have to be nicely implemented first.
Another option I thought is to mark status.hang or add a flag to amdgpu_gfx 
when job timeout reported on gfx/comp ring. And this will require some logic to 
map the relationship for ring and ip blocks. This way does not look good as 
well.



I don't think we need this at the ring level, its enough to know that 
the reset you are going through is because of one of rings are hanged to 
apply this skip logic, it's pretty easy if we add 'bool hang' flag to 
adev->reset_domain
which u can set in the beginning amdgpu_job_timedout and clear in the 
end. No protection is required as all the resets from all origins are 
serialized with timeout handler in a single threaded queue.


Andrey





Thanks,
Victor



-Original Message-
From: Grodzovsky, Andrey 
Sent: Wednesday, July 27, 2022 12:57 AM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Deng, Emily ; 
Koenig, Christian 
Subject: Re: [PATCH 5/5] drm/amdgpu: reduce reset time


On 2022-07-26 05:40, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Andrey,

Reply inline.


Thanks,
Victor



-Original Message-
From: Grodzovsky, Andrey 
Sent: Tuesday, July 26, 2022 5:18 AM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Deng, Emily
; Koenig, Christian 
Subject: Re: [PATCH 5/5] drm/amdgpu: reduce reset time


On 2022-07-22 03:34, Victor Zhao wrote:

In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset

Why are they failing in this case ? Skipping ring tests is not the best idea as 
you loose important indicator of system's sanity. Is there any way to make them 
work ?

[Victor]: I've seen gfx ring test fail every time after a gfx engine hang. I 
thought it should be expected as gfx is in a bad state. Do you know the reason 
we have ring tests before reset? As we are going to reset the asic anyway.
Another approach could be to make the skip mode2 only or reduce the wait time 
here.


I dug down in history and according to commit 'drm/amdgpu:unmap KCQ in gfx 
hw_fini(v2)' you need to write to scratch register for completion of queue 
unmap operation so you defently don't want to just skip it. I agree in case 
that the ring is hung this has no point but remember that GPU reset can happen 
not only to a hunged ring but for other reasons (RAS, manual reset e.t.c.) in 
which case you probably want to shut down gracefully here ?
I see we have adev->ip_blocks[i].status.hang flag which you maybe can use here 
instead ?





Signed-off-by: Victor Zhao 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |  2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 26 +++--
2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 222d3d7ea076..f872495ccc3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i],
   RESET_QUEUES, 0, 0);

-	if (adev->gfx.kiq.ring.sched.ready)

+   if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev))
r = amdgpu_ring_test_helper(kiq_ring);
spin_unlock(>gfx.kiq.ring_lock);

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index fafbad3cf08d..9ae29023e38f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device 
*adev, bool enable)
WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp);
}

-	for (i = 0; i < adev->usec_timeout; i++) {

-   if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
-   break;
-   udelay(1);
-   }
-
-   if (i >= adev->usec_timeout)
-   DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt");
+   if (!amdgpu_in_reset(adev)) {
+   for (i = 0; i < adev->usec_timeout; i++) {
+   if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
+   break;
+   udelay(1);
+   }

+		if (i >= adev->usec_timeout)

+   DRM_ERROR("failed to %s cp gfx\n",
+ enable ? "unhalt" : "halt");
+   }
return 0;
+
}

This change has impact beyond container case no ? We had no issue with this 
code during regular reset cases so why we would give up on 

Re: Crash on resume from S3

2022-07-26 Thread Andrey Grodzovsky
The stack trace is expected part of reset procedure  so that ok. The 
issue you are having is a hang on one of GPU jobs during resume which 
triggers a GPU reset attempt.


You can open a ticket with this issue here 
https://gitlab.freedesktop.org/drm/amd/-/issues, please attach full 
dmesg log.


Andrey

On 2022-07-26 05:06, Tom Cook wrote:

I have a Ryzen 7 3700U in an HP laptop.  lspci describes the GPU in this way:

04:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
Series] (rev c1)

This laptop has never successfully resumed from suspend (I have tried
every 5.x kernel).  Currently on 5.18.0, the system appears to be okay
after resume apart from the gpu which is usually giving a blank
screen, occasionally a scrambled output.  After rebooting, I see this
in syslog:

Jul 25 11:02:18 frog kernel: [240782.968674] amdgpu :04:00.0:
amdgpu: GPU reset begin!
Jul 25 11:02:19 frog kernel: [240783.974891] amdgpu :04:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test
failed (-110)
Jul 25 11:02:19 frog kernel: [240783.988650] [drm] free PSP TMR buffer
Jul 25 11:02:19 frog kernel: [240784.019057] CPU: 4 PID: 305612 Comm:
kworker/u32:17 Not tainted 5.18.0 #1
Jul 25 11:02:19 frog kernel: [240784.019063] Hardware name: HP HP ENVY
x360 Convertible 15-ds0xxx/85DD, BIOS F.20 05/28/2020
Jul 25 11:02:19 frog kernel: [240784.019067] Workqueue:
amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Jul 25 11:02:19 frog kernel: [240784.019079] Call Trace:
Jul 25 11:02:19 frog kernel: [240784.019082]  
Jul 25 11:02:19 frog kernel: [240784.019085]  dump_stack_lvl+0x49/0x5f
Jul 25 11:02:19 frog kernel: [240784.019095]  dump_stack+0x10/0x12
Jul 25 11:02:19 frog kernel: [240784.019099]
amdgpu_do_asic_reset+0x2f/0x4e0 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019278]
amdgpu_device_gpu_recover_imp+0x41e/0xb50 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019452]
amdgpu_job_timedout+0x155/0x1b0 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019674]
drm_sched_job_timedout+0x74/0xf0 [gpu_sched]
Jul 25 11:02:19 frog kernel: [240784.019681]  ?
amdgpu_cgs_destroy_device+0x10/0x10 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019896]  ?
drm_sched_job_timedout+0x74/0xf0 [gpu_sched]
Jul 25 11:02:19 frog kernel: [240784.019903]  process_one_work+0x227/0x440
Jul 25 11:02:19 frog kernel: [240784.019908]  worker_thread+0x31/0x3d0
Jul 25 11:02:19 frog kernel: [240784.019912]  ? process_one_work+0x440/0x440
Jul 25 11:02:19 frog kernel: [240784.019914]  kthread+0xfe/0x130
Jul 25 11:02:19 frog kernel: [240784.019918]  ?
kthread_complete_and_exit+0x20/0x20
Jul 25 11:02:19 frog kernel: [240784.019923]  ret_from_fork+0x22/0x30
Jul 25 11:02:19 frog kernel: [240784.019930]  
Jul 25 11:02:19 frog kernel: [240784.019934] amdgpu :04:00.0:
amdgpu: MODE2 reset
Jul 25 11:02:19 frog kernel: [240784.020178] amdgpu :04:00.0:
amdgpu: GPU reset succeeded, trying to resume
Jul 25 11:02:19 frog kernel: [240784.020552] [drm] PCIE GART of 1024M enabled.
Jul 25 11:02:19 frog kernel: [240784.020555] [drm] PTB located at
0x00F40090
Jul 25 11:02:19 frog kernel: [240784.020577] [drm] VRAM is lost due to
GPU reset!
Jul 25 11:02:19 frog kernel: [240784.020579] [drm] PSP is resuming...
Jul 25 11:02:19 frog kernel: [240784.040465] [drm] reserve 0x40
from 0xf47fc0 for PSP TMR

I'm running the latest BIOS from HP.  Is there anything I can do to
work around this?  Or anything I can do to help debug it?

Regards,
Tom Cook


Re: [PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 reset

2022-07-26 Thread Andrey Grodzovsky

Got it

Acked-by: Andrey Grodzovsky 

Andrey

On 2022-07-26 06:01, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Andrey,

For slow tests I mean the slow hang tests by quark tool.
An example here:
hang_vm_gfx_dispatch_slow.lua - This script runs on a graphics engine using 
compute engine and has a hacked CS program which is massive and duplicates 
standard CS program move code hundreds of thousands of times. The effect is a 
very slowly executing CS program.

It's not a bad job but just need a very long time to finish. I suppose we don’t 
have a way to stop shader here. And the running apps will be affected when 
reset is done.


Thanks,
Victor



-Original Message-
From: Grodzovsky, Andrey 
Sent: Tuesday, July 26, 2022 5:20 AM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Deng, Emily ; 
Koenig, Christian 
Subject: Re: [PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 
reset

On 2022-07-22 03:34, Victor Zhao wrote:


For some hang caused by slow tests, engine cannot be stopped which may
cause resume failure after reset. In this case, force halt engine by
reverting context addresses


Can you maybe explain a bit more what exactly you mean by slow test and why 
engine cannot be stopped in this case ?

Andrey



Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h  |  1 +
   drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c| 36 +
   drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c |  2 ++
   4 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5498fda8617f..833dc5e224d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5037,6 +5037,7 @@ static void amdgpu_device_recheck_guilty_jobs(
   
   			/* set guilty */

drm_sched_increase_karma(s_job);
+   amdgpu_reset_prepare_hwcontext(adev, reset_context);
   retry:
/* do hw reset */
if (amdgpu_sriov_vf(adev)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
index f8036f2b100e..c7b44aeb671b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
@@ -37,6 +37,7 @@ struct amdgpu_gfxhub_funcs {
void (*utcl2_harvest)(struct amdgpu_device *adev);
void (*mode2_save_regs)(struct amdgpu_device *adev);
void (*mode2_restore_regs)(struct amdgpu_device *adev);
+   void (*halt)(struct amdgpu_device *adev);
   };
   
   struct amdgpu_gfxhub {

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
index 51cf8acd2d79..8cf53e039c11 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
@@ -646,6 +646,41 @@ static void gfxhub_v2_1_restore_regs(struct amdgpu_device 
*adev)
WREG32_SOC15(GC, 0, mmGCMC_VM_MX_L1_TLB_CNTL, 
adev->gmc.MC_VM_MX_L1_TLB_CNTL);
   }
   
+static void gfxhub_v2_1_halt(struct amdgpu_device *adev) {

+   struct amdgpu_vmhub *hub = >vmhub[AMDGPU_GFXHUB_0];
+   int i;
+   uint32_t tmp;
+   int time = 1000;
+
+   gfxhub_v2_1_set_fault_enable_default(adev, false);
+
+   for (i = 0; i <= 14; i++) {
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32,
+   i * hub->ctx_addr_distance, ~0);
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32,
+   i * hub->ctx_addr_distance, ~0);
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_LO32,
+   i * hub->ctx_addr_distance,
+   0);
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_HI32,
+   i * hub->ctx_addr_distance,
+   0);
+   }
+   tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2);
+   while ((tmp & (GRBM_STATUS2__EA_BUSY_MASK |
+ GRBM_STATUS2__EA_LINK_BUSY_MASK)) != 0 &&
+  time) {
+   udelay(100);
+   time--;
+   tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2);
+   }
+
+   if (!time) {
+   DRM_WARN("failed to wait for GRBM(EA) idle\n");
+   }
+}
+
   const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = {
.get_fb_location = gfxhub_v2_1_get_fb_location,
.get_mc_fb_offset = gfxhub_v2_1_get_mc_fb_offset, @@ -658,4 +693,5
@@ const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = {
.utcl2_harvest = gfxhub_v2_1_utcl2_harvest,
.mode2_save_regs = gfxhub_v2_1_save_regs,
.mode2_restore_regs = gfxhub_v2_1_restore_regs,
+

Re: [PATCH 5/5] drm/amdgpu: reduce reset time

2022-07-26 Thread Andrey Grodzovsky



On 2022-07-26 05:40, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Andrey,

Reply inline.


Thanks,
Victor



-Original Message-
From: Grodzovsky, Andrey 
Sent: Tuesday, July 26, 2022 5:18 AM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Deng, Emily ; 
Koenig, Christian 
Subject: Re: [PATCH 5/5] drm/amdgpu: reduce reset time


On 2022-07-22 03:34, Victor Zhao wrote:

In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset


Why are they failing in this case ? Skipping ring tests is not the best idea as 
you loose important indicator of system's sanity. Is there any way to make them 
work ?

[Victor]: I've seen gfx ring test fail every time after a gfx engine hang. I 
thought it should be expected as gfx is in a bad state. Do you know the reason 
we have ring tests before reset? As we are going to reset the asic anyway.
Another approach could be to make the skip mode2 only or reduce the wait time 
here.



I dug down in history and according to commit 'drm/amdgpu:unmap KCQ in 
gfx hw_fini(v2)' you need to write to scratch register for completion of 
queue unmap operation so you defently don't want to just skip it. I 
agree in case
that the ring is hung this has no point but remember that GPU reset can 
happen not only to a hunged ring but for other reasons (RAS, manual 
reset e.t.c.) in which case you probably want to shut down gracefully here ?
I see we have adev->ip_blocks[i].status.hang flag which you maybe can 
use here instead ?







Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |  2 +-
   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 26 +++--
   2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 222d3d7ea076..f872495ccc3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i],
   RESET_QUEUES, 0, 0);
   
-	if (adev->gfx.kiq.ring.sched.ready)

+   if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev))
r = amdgpu_ring_test_helper(kiq_ring);
spin_unlock(>gfx.kiq.ring_lock);
   
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index fafbad3cf08d..9ae29023e38f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device 
*adev, bool enable)
WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp);
}
   
-	for (i = 0; i < adev->usec_timeout; i++) {

-   if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
-   break;
-   udelay(1);
-   }
-
-   if (i >= adev->usec_timeout)
-   DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt");
+   if (!amdgpu_in_reset(adev)) {
+   for (i = 0; i < adev->usec_timeout; i++) {
+   if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
+   break;
+   udelay(1);
+   }
   
+		if (i >= adev->usec_timeout)

+   DRM_ERROR("failed to %s cp gfx\n",
+ enable ? "unhalt" : "halt");
+   }
return 0;
+
   }


This change has impact beyond container case no ? We had no issue with this 
code during regular reset cases so why we would give up on this code which 
confirms CP is idle ? What is the side effect of skipping this during all GPU 
resets ?

Andrey

[Victor]: I see "failed to halt cp gfx" with regular reset cases as well when 
doing a gfx hang test using quark. I haven't seen a side effect with Mode1 reset yet but 
maybe shorten the wait time could be better?



Same as above i guess, it would indeed time out for a hung ring but GPU 
reset happens not only because of hung rings but for other reasons.


Andrey




   
   static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device

*adev) @@ -7569,8 +7572,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct 
amdgpu_device *adev)
for (i = 0; i < adev->gfx.num_gfx_rings; i++)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.gfx_ring[i],
   PREEMPT_QUEUES, 0, 0);
-
-   return amdgpu_ring_test_helper(kiq_ring);
+   if (!amdgpu_in_reset(adev))
+   return amdgpu_ring_test_helper(kiq_ring);
+   else
+   return 0;
   }
   #endif
   
@@ -7610,6 +7615,7 @@ static int gfx_v10_0_hw_fini(void *handle)
   
   		return 0;

}
+
gfx_v10_0_cp_enable(adev, false);

Re: [PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 reset

2022-07-25 Thread Andrey Grodzovsky

On 2022-07-22 03:34, Victor Zhao wrote:


For some hang caused by slow tests, engine cannot be stopped which
may cause resume failure after reset. In this case, force halt
engine by reverting context addresses



Can you maybe explain a bit more what exactly you mean by slow test and
why engine cannot be stopped in this case ?

Andrey




Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h  |  1 +
  drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c| 36 +
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c |  2 ++
  4 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5498fda8617f..833dc5e224d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5037,6 +5037,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  
  			/* set guilty */

drm_sched_increase_karma(s_job);
+   amdgpu_reset_prepare_hwcontext(adev, reset_context);
  retry:
/* do hw reset */
if (amdgpu_sriov_vf(adev)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
index f8036f2b100e..c7b44aeb671b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
@@ -37,6 +37,7 @@ struct amdgpu_gfxhub_funcs {
void (*utcl2_harvest)(struct amdgpu_device *adev);
void (*mode2_save_regs)(struct amdgpu_device *adev);
void (*mode2_restore_regs)(struct amdgpu_device *adev);
+   void (*halt)(struct amdgpu_device *adev);
  };
  
  struct amdgpu_gfxhub {

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
index 51cf8acd2d79..8cf53e039c11 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
@@ -646,6 +646,41 @@ static void gfxhub_v2_1_restore_regs(struct amdgpu_device 
*adev)
WREG32_SOC15(GC, 0, mmGCMC_VM_MX_L1_TLB_CNTL, 
adev->gmc.MC_VM_MX_L1_TLB_CNTL);
  }
  
+static void gfxhub_v2_1_halt(struct amdgpu_device *adev)

+{
+   struct amdgpu_vmhub *hub = >vmhub[AMDGPU_GFXHUB_0];
+   int i;
+   uint32_t tmp;
+   int time = 1000;
+
+   gfxhub_v2_1_set_fault_enable_default(adev, false);
+
+   for (i = 0; i <= 14; i++) {
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32,
+   i * hub->ctx_addr_distance, ~0);
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32,
+   i * hub->ctx_addr_distance, ~0);
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_LO32,
+   i * hub->ctx_addr_distance,
+   0);
+   WREG32_SOC15_OFFSET(GC, 0, 
mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_HI32,
+   i * hub->ctx_addr_distance,
+   0);
+   }
+   tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2);
+   while ((tmp & (GRBM_STATUS2__EA_BUSY_MASK |
+ GRBM_STATUS2__EA_LINK_BUSY_MASK)) != 0 &&
+  time) {
+   udelay(100);
+   time--;
+   tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2);
+   }
+
+   if (!time) {
+   DRM_WARN("failed to wait for GRBM(EA) idle\n");
+   }
+}
+
  const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = {
.get_fb_location = gfxhub_v2_1_get_fb_location,
.get_mc_fb_offset = gfxhub_v2_1_get_mc_fb_offset,
@@ -658,4 +693,5 @@ const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = {
.utcl2_harvest = gfxhub_v2_1_utcl2_harvest,
.mode2_save_regs = gfxhub_v2_1_save_regs,
.mode2_restore_regs = gfxhub_v2_1_restore_regs,
+   .halt = gfxhub_v2_1_halt,
  };
diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c 
b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
index 51a5b68f77d3..fead7251292f 100644
--- a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
+++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
@@ -97,6 +97,8 @@ sienna_cichlid_mode2_prepare_hwcontext(struct 
amdgpu_reset_control *reset_ctl,
if (!amdgpu_sriov_vf(adev)) {
if (adev->gfxhub.funcs->mode2_save_regs)
adev->gfxhub.funcs->mode2_save_regs(adev);
+   if (adev->gfxhub.funcs->halt)
+   adev->gfxhub.funcs->halt(adev);
r = sienna_cichlid_mode2_suspend_ip(adev);
}
  


Re: [PATCH 3/5] drm/amdgpu: save and restore gc hub regs

2022-07-25 Thread Andrey Grodzovsky

Acked-by: Andrey Grodzovsky 

Andrey

On 2022-07-22 03:34, Victor Zhao wrote:

Save and restore gfxhub regs as they will be reset during mode 2

Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h|  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h   | 26 +++
  drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c  | 72 +++
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c   |  7 +-
  .../include/asic_reg/gc/gc_10_3_0_offset.h|  4 ++
  5 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
index beabab515836..f8036f2b100e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
@@ -35,6 +35,8 @@ struct amdgpu_gfxhub_funcs {
void (*init)(struct amdgpu_device *adev);
int (*get_xgmi_info)(struct amdgpu_device *adev);
void (*utcl2_harvest)(struct amdgpu_device *adev);
+   void (*mode2_save_regs)(struct amdgpu_device *adev);
+   void (*mode2_restore_regs)(struct amdgpu_device *adev);
  };
  
  struct amdgpu_gfxhub {

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 008eaca27151..0305b660cd17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -264,6 +264,32 @@ struct amdgpu_gmc {
u64 mall_size;
/* number of UMC instances */
int num_umc;
+   /* mode2 save restore */
+   u64 VM_L2_CNTL;
+   u64 VM_L2_CNTL2;
+   u64 VM_DUMMY_PAGE_FAULT_CNTL;
+   u64 VM_DUMMY_PAGE_FAULT_ADDR_LO32;
+   u64 VM_DUMMY_PAGE_FAULT_ADDR_HI32;
+   u64 VM_L2_PROTECTION_FAULT_CNTL;
+   u64 VM_L2_PROTECTION_FAULT_CNTL2;
+   u64 VM_L2_PROTECTION_FAULT_MM_CNTL3;
+   u64 VM_L2_PROTECTION_FAULT_MM_CNTL4;
+   u64 VM_L2_PROTECTION_FAULT_ADDR_LO32;
+   u64 VM_L2_PROTECTION_FAULT_ADDR_HI32;
+   u64 VM_DEBUG;
+   u64 VM_L2_MM_GROUP_RT_CLASSES;
+   u64 VM_L2_BANK_SELECT_RESERVED_CID;
+   u64 VM_L2_BANK_SELECT_RESERVED_CID2;
+   u64 VM_L2_CACHE_PARITY_CNTL;
+   u64 VM_L2_IH_LOG_CNTL;
+   u64 VM_CONTEXT_CNTL[16];
+   u64 VM_CONTEXT_PAGE_TABLE_BASE_ADDR_LO32[16];
+   u64 VM_CONTEXT_PAGE_TABLE_BASE_ADDR_HI32[16];
+   u64 VM_CONTEXT_PAGE_TABLE_START_ADDR_LO32[16];
+   u64 VM_CONTEXT_PAGE_TABLE_START_ADDR_HI32[16];
+   u64 VM_CONTEXT_PAGE_TABLE_END_ADDR_LO32[16];
+   u64 VM_CONTEXT_PAGE_TABLE_END_ADDR_HI32[16];
+   u64 MC_VM_MX_L1_TLB_CNTL;
  };
  
  #define amdgpu_gmc_flush_gpu_tlb(adev, vmid, vmhub, type) ((adev)->gmc.gmc_funcs->flush_gpu_tlb((adev), (vmid), (vmhub), (type)))

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
index d8c531581116..51cf8acd2d79 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
@@ -576,6 +576,76 @@ static void gfxhub_v2_1_utcl2_harvest(struct amdgpu_device 
*adev)
}
  }
  
+static void gfxhub_v2_1_save_regs(struct amdgpu_device *adev)

+{
+   int i;
+   adev->gmc.VM_L2_CNTL = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL);
+   adev->gmc.VM_L2_CNTL2 = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL2);
+   adev->gmc.VM_DUMMY_PAGE_FAULT_CNTL = RREG32_SOC15(GC, 0, 
mmGCVM_DUMMY_PAGE_FAULT_CNTL);
+   adev->gmc.VM_DUMMY_PAGE_FAULT_ADDR_LO32 = RREG32_SOC15(GC, 0, 
mmGCVM_DUMMY_PAGE_FAULT_ADDR_LO32);
+   adev->gmc.VM_DUMMY_PAGE_FAULT_ADDR_HI32 = RREG32_SOC15(GC, 0, 
mmGCVM_DUMMY_PAGE_FAULT_ADDR_HI32);
+   adev->gmc.VM_L2_PROTECTION_FAULT_CNTL = RREG32_SOC15(GC, 0, 
mmGCVM_L2_PROTECTION_FAULT_CNTL);
+   adev->gmc.VM_L2_PROTECTION_FAULT_CNTL2 = RREG32_SOC15(GC, 0, 
mmGCVM_L2_PROTECTION_FAULT_CNTL2);
+   adev->gmc.VM_L2_PROTECTION_FAULT_MM_CNTL3 = RREG32_SOC15(GC, 0, 
mmGCVM_L2_PROTECTION_FAULT_MM_CNTL3);
+   adev->gmc.VM_L2_PROTECTION_FAULT_MM_CNTL4 = RREG32_SOC15(GC, 0, 
mmGCVM_L2_PROTECTION_FAULT_MM_CNTL4);
+   adev->gmc.VM_L2_PROTECTION_FAULT_ADDR_LO32 = RREG32_SOC15(GC, 0, 
mmGCVM_L2_PROTECTION_FAULT_ADDR_LO32);
+   adev->gmc.VM_L2_PROTECTION_FAULT_ADDR_HI32 = RREG32_SOC15(GC, 0, 
mmGCVM_L2_PROTECTION_FAULT_ADDR_HI32);
+   adev->gmc.VM_DEBUG = RREG32_SOC15(GC, 0, mmGCVM_DEBUG);
+   adev->gmc.VM_L2_MM_GROUP_RT_CLASSES = RREG32_SOC15(GC, 0, 
mmGCVM_L2_MM_GROUP_RT_CLASSES);
+   adev->gmc.VM_L2_BANK_SELECT_RESERVED_CID = RREG32_SOC15(GC, 0, 
mmGCVM_L2_BANK_SELECT_RESERVED_CID);
+   adev->gmc.VM_L2_BANK_SELECT_RESERVED_CID2 = RREG32_SOC15(GC, 0, 
mmGCVM_L2_BANK_SELECT_RESERVED_CID2);
+   adev->gmc.VM_L2_CACHE_PARITY_CNTL = RREG32_SOC15(GC, 0, 
mmGCVM_L2_CACHE_PARITY_CNTL);
+   adev->gmc.VM_L2_IH_LOG_CNTL = RREG32_SOC15(GC, 0, 
mmGCVM_L2_IH_LOG_CNTL);
+
+   for (i = 0; i <= 15; i++) {
+   adev->gmc.VM_CONTEXT_CNTL[i] = RREG32_SOC1

Re: [PATCH 5/5] drm/amdgpu: reduce reset time

2022-07-25 Thread Andrey Grodzovsky



On 2022-07-22 03:34, Victor Zhao wrote:

In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset



Why are they failing in this case ? Skipping ring tests is not the best 
idea as you
loose important indicator of system's sanity. Is there any way to make 
them work ?





Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 26 +++--
  2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 222d3d7ea076..f872495ccc3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i],
   RESET_QUEUES, 0, 0);
  
-	if (adev->gfx.kiq.ring.sched.ready)

+   if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev))
r = amdgpu_ring_test_helper(kiq_ring);
spin_unlock(>gfx.kiq.ring_lock);
  
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

index fafbad3cf08d..9ae29023e38f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device 
*adev, bool enable)
WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp);
}
  
-	for (i = 0; i < adev->usec_timeout; i++) {

-   if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
-   break;
-   udelay(1);
-   }
-
-   if (i >= adev->usec_timeout)
-   DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt");
+   if (!amdgpu_in_reset(adev)) {
+   for (i = 0; i < adev->usec_timeout; i++) {
+   if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
+   break;
+   udelay(1);
+   }
  
+		if (i >= adev->usec_timeout)

+   DRM_ERROR("failed to %s cp gfx\n",
+ enable ? "unhalt" : "halt");
+   }
return 0;
+
  }



This change has impact beyond container case no ? We had no issue
with this code during regular reset cases so why we would give up on 
this code
which confirms CP is idle ? What is the side effect of skipping this 
during all GPU resets ?


Andrey


  
  static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device *adev)

@@ -7569,8 +7572,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct 
amdgpu_device *adev)
for (i = 0; i < adev->gfx.num_gfx_rings; i++)
kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.gfx_ring[i],
   PREEMPT_QUEUES, 0, 0);
-
-   return amdgpu_ring_test_helper(kiq_ring);
+   if (!amdgpu_in_reset(adev))
+   return amdgpu_ring_test_helper(kiq_ring);
+   else
+   return 0;
  }
  #endif
  
@@ -7610,6 +7615,7 @@ static int gfx_v10_0_hw_fini(void *handle)
  
  		return 0;

}
+
gfx_v10_0_cp_enable(adev, false);
gfx_v10_0_enable_gui_idle_interrupt(adev, false);
  


Re: [PATCH 2/5] drm/amdgpu: add debugfs amdgpu_reset_level

2022-07-25 Thread Andrey Grodzovsky



On 2022-07-25 13:37, Christian König wrote:

Hi Victor,

Am 25.07.22 um 12:45 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Hi @Grodzovsky, Andrey,

Please help review the series, thanks a lot.

Hi @Koenig, Christian,

I thought a module parameter will be exposed to a common user, this 
control was added to help debug and test so I put it as a debugfs. I 
can make it module parameter if more appropriate.


That's a really good argument. I leave the decision if we should use a 
module parameter or debugfs file to Andrey.


If you go with debugfs then using the debugfs_create_u32() or 
debugfs_create_u64() function would be more appropriate I think. And 
then don't make that global, but rather a per device flag.


Regards,
Christian.



Makes sense to me too.

Andrey







Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Friday, July 22, 2022 4:20 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Deucher, Alexander 


Subject: Re: [PATCH 2/5] drm/amdgpu: add debugfs amdgpu_reset_level

Well NAK to the debugfs approach, stuff like that is usually a module 
parameter.


Apart from that this series needs to be reviewed by Andrey.

Regards,
Christian.

Am 22.07.22 um 09:34 schrieb Victor Zhao:

Introduce amdgpu_reset_level debugfs in order to help debug and test
specific type of reset. Also helps blocking unwanted type of resets.

By default, mode2 reset will not be enabled

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  4 
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 20 


   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c   |  6 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  3 +++
   5 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6cd1e0a6dffc..c661231a6a07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -238,6 +238,7 @@ extern int amdgpu_si_support;
   extern int amdgpu_cik_support;
   #endif
   extern int amdgpu_num_kcq;
+extern uint amdgpu_reset_level_mask;
      #define AMDGPU_VCNFW_LOG_SIZE (32 * 1024)
   extern int amdgpu_vcnfw_log;
@@ -274,6 +275,9 @@ extern int amdgpu_vcnfw_log;
   #define AMDGPU_RESET_VCE    (1 << 13)
   #define AMDGPU_RESET_VCE1    (1 << 14)
   +#define AMDGPU_RESET_LEVEL_SOFT_RECOVERY (1 << 0) #define
+AMDGPU_RESET_LEVEL_MODE2 (1 << 1)
+
   /* max cursor sizes (in pixels) */
   #define CIK_CURSOR_WIDTH 128
   #define CIK_CURSOR_HEIGHT 128
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index e2eec985adb3..235c48e4ba4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1661,12 +1661,29 @@ static int amdgpu_debugfs_sclk_set(void 
*data, u64 val)

   return ret;
   }
   +static int amdgpu_debugfs_reset_level_get(void *data, u64 *val) {
+    struct amdgpu_device *adev = (struct amdgpu_device *)data;
+    *val = amdgpu_reset_level_mask;
+    return 0;
+}
+
+static int amdgpu_debugfs_reset_level_set(void *data, u64 val) {
+    struct amdgpu_device *adev = (struct amdgpu_device *)data;
+    amdgpu_reset_level_mask = val;
+    return 0;
+}
+
   DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
   amdgpu_debugfs_ib_preempt, "%llu\n");
      DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
   amdgpu_debugfs_sclk_set, "%llu\n");
   +DEFINE_DEBUGFS_ATTRIBUTE(fops_reset_level, 
amdgpu_debugfs_reset_level_get,

+    amdgpu_debugfs_reset_level_set, "%llu\n");
+
   static ssize_t amdgpu_reset_dump_register_list_read(struct file *f,
   char __user *buf, size_t size, loff_t *pos)
   {
@@ -1785,6 +1802,9 @@ int amdgpu_debugfs_init(struct amdgpu_device 
*adev)

   return PTR_ERR(ent);
   }
   +    debugfs_create_file("amdgpu_reset_level", 0200, root, adev,
+  _reset_level);
+
   /* Register debugfs entries for amdgpu_ttm */
   amdgpu_ttm_debugfs_init(adev);
   amdgpu_debugfs_pm_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e8c6c3fe9374..fb8f3cb853a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -198,6 +198,7 @@ struct amdgpu_watchdog_timer 
amdgpu_watchdog_timer = {

   .timeout_fatal_disable = false,
   .period = 0x0, /* default to 0x0 (timeout disable) */
   };
+uint amdgpu_reset_level_mask = 0x1;
      /**
    * DOC: vramlimit (int)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 831fb222139c..f16ab1a54b70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -74,6 +74,9 @@ int amdgpu_reset_prepare_hwcontext(struct 
amdgpu_device *adev,

   {
   

Re: [PATCH 1/5] drm/amdgpu: add mode2 reset for sienna_cichlid

2022-07-25 Thread Andrey Grodzovsky



On 2022-07-22 03:33, Victor Zhao wrote:

To meet the requirement for multi container usecase which needs
a quicker reset and not causing VRAM lost, adding the Mode2
reset handler for sienna_cichlid. Adding a AMDGPU_SKIP_MODE2_RESET
flag so driver can fallback to default reset method when mode2
reset failed and retry.

- add mode2 reset handler for sienna_cichlid



Seems to me ASIC specific changes should be in a seperate patch



- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed

Signed-off-by: Victor Zhao 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   7 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c |  13 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h |   1 +
  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   1 +
  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   1 +
  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c |   1 +
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c   | 297 ++
  drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h   |  32 ++
  .../pm/swsmu/inc/pmfw_if/smu_v11_0_7_ppsmc.h  |   4 +-
  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |   3 +-
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  54 
  15 files changed, 414 insertions(+), 5 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index c7d0cd15b5ef..7030ac2d7d2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -75,7 +75,7 @@ amdgpu-y += \
vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o 
vega10_reg_init.o \
vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o 
mxgpu_nv.o \
nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o 
soc21.o \
-   nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o
+   sienna_cichlid.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o 
lsdma_v6_0.o
  
  # add DF block

  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 5e53a5293935..091415a4abf0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -135,6 +135,7 @@ static void amdgpu_amdkfd_reset_work(struct work_struct 
*work)
reset_context.method = AMD_RESET_METHOD_NONE;
reset_context.reset_req_dev = adev;
clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
+   clear_bit(AMDGPU_SKIP_MODE2_RESET, _context.flags);
  
  	amdgpu_device_gpu_recover(adev, NULL, _context);

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b79ee4ffb879..5498fda8617f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5146,6 +5146,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  
  	reset_context->job = job;

reset_context->hive = hive;
+
/*
 * Build list of devices to reset.
 * In case we are in XGMI hive mode, resort the device list
@@ -5265,8 +5266,11 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
amdgpu_ras_resume(adev);
} else {
r = amdgpu_do_asic_reset(device_list_handle, reset_context);
-   if (r && r == -EAGAIN)
+   if (r && r == -EAGAIN) {
+   set_bit(AMDGPU_SKIP_MODE2_RESET, _context->flags);
+   adev->asic_reset_res = 0;



See my question bellow related to this set



goto retry;
+   }
}
  
  skip_hw_reset:

@@ -5694,6 +5698,7 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev 
*pdev)
reset_context.reset_req_dev = adev;
set_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
set_bit(AMDGPU_SKIP_HW_RESET, _context.flags);
+   set_bit(AMDGPU_SKIP_MODE2_RESET, _context.flags);
  
  	adev->no_hw_access = true;

r = amdgpu_device_pre_asic_reset(adev, _context);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 10fdd12cf853..9844d99075e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -71,6 +71,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
reset_context.method = AMD_RESET_METHOD_NONE;
reset_context.reset_req_dev = adev;
clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
+   clear_bit(AMDGPU_SKIP_MODE2_RESET, _context.flags);
  
  		r = 

Re: [PATCH] drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()

2022-07-25 Thread Andrey Grodzovsky

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2022-07-19 06:39, Andrey Strachuk wrote:

Local variable 'rq' is initialized by an address
of field of drm_sched_job, so it does not make
sense to compare 'rq' with NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Andrey Strachuk 
Fixes: 7c6e68c777f1 ("drm/amdgpu: Avoid HW GPU reset for RAS.")
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 67f66f2f1809..600401f2a98f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -285,10 +285,6 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
/* Signal all jobs not yet scheduled */
for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
i--) {
struct drm_sched_rq *rq = >sched_rq[i];
-
-   if (!rq)
-   continue;
-
spin_lock(>lock);
list_for_each_entry(s_entity, >entities, list) {
while ((s_job = 
to_drm_sched_job(spsc_queue_pop(_entity->job_queue {


Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

2022-07-16 Thread Andrey Grodzovsky



On 2022-07-15 05:28, Zhu, Jiadong wrote:

[AMD Official Use Only - General]

Updated some comments

-Original Message-
From: Zhu, Jiadong
Sent: Friday, July 15, 2022 5:13 PM
To: Christian König ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 
Cc: Huang, Ray ; Liu, Aaron 
Subject: RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

Hi Christian,

The resubmitted job in function amdgpu_ib_preempt_job_recovery returns the same 
hw fence because of this commit:

static void amdgpu_ib_preempt_job_recovery(struct drm_gpu_scheduler *sched) {
 struct drm_sched_job *s_job;
 struct dma_fence *fence;

 spin_lock(>job_list_lock);
 list_for_each_entry(s_job, >pending_list, list) {
 fence = sched->ops->run_job(s_job);   //fence returned has 
the same address with swapped fences
 dma_fence_put(fence);
 }
 spin_unlock(>job_list_lock);
}



commit c530b02f39850a639b72d01ebbf7e5d745c60831
Author: Jack Zhang 
Date:   Wed May 12 15:06:35 2021 +0800

 drm/amd/amdgpu embed hw_fence into amdgpu_job

 Why: Previously hw fence is alloced separately with job.
 It caused historical lifetime issues and corner cases.
 The ideal situation is to take fence to manage both job
 and fence's lifetime, and simplify the design of gpu-scheduler.

 How:
 We propose to embed hw_fence into amdgpu_job.
 1. We cover the normal job submission by this method.
 2. For ib_test, and submit without a parent job keep the
 legacy way to create a hw fence separately.
 v2:
 use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
 embedded in a job.
 v3:
 remove redundant variable ring in amdgpu_job
 v4:
 add tdr sequence support for this feature. Add a job_run_counter to
 indicate whether this job is a resubmit job.
 v5
 add missing handling in amdgpu_fence_enable_signaling

 Signed-off-by: Jingwen Chen 
 Signed-off-by: Jack Zhang 
     Reviewed-by: Andrey Grodzovsky 
 Reviewed by: Monk Liu 
 Signed-off-by: Alex Deucher 


Thus the fence we swapped out is signaled and put twice in the following 2 functions and 
we get " refcount_t: underflow; use-after-free. " errors latter.

 /* wait for jobs finished */
 amdgpu_fence_wait_empty(ring); //wait on the resubmitted fence 
which is signaled and put somewhere else. The refcount decreased by 1 after 
amdgpu_fence_wait_empty.

 /* signal the old fences */
 amdgpu_ib_preempt_signal_fences(fences, length);   //signal 
and put the previous swapped fence, signal would return -22.

Thanks,
Jiadong



Did you have 'drm/amdgpu: Follow up change to previous drm scheduler 
change.' this commit in your branch while you encountered this problem ? 
I don't see an underflow issue

for the preempted job when inspecting the code with this commit in mind -

amdgpu_fence_emit
    dma_fence_init 1
    dma_fence_get(fence) 2
    rcu_assign_pointer(*ptr, dma_fence_get(fence) 3

drm_sched_main
    s_fence->parent = dma_fence_get(fence); 4
    dma_fence_put(fence); 3

amdgpu_ib_preempt_job_recovery
    amdgpu_fence_emit
        if (job && job->job_run_counter) -> dma_fence_get(fence); 4
        rcu_assign_pointer(*ptr, dma_fence_get(fence)); 5

    dma_fence_put(fence); 4

amdgpu_fence_wait_empty
    dma_fence_get_rcu(fence) 5
    dma_fence_put(fence) 4

amdgpu_process_fence (EOP interrupt for re-submission of preempted job)
    dma_fence_put 3

amdgpu_ib_preempt_signal_fences
    dma_fence_put 2

amdgpu_job_free_cb
    dma_fence_put(>hw_fence) 1

drm_sched_fence_release_scheduled
    dma_fence_put(fence->parent); 0

Also take a look here for reference - 
https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view


Andrey





Andrey





-Original Message-
From: Christian König 
Sent: Friday, July 15, 2022 4:48 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org; Grodzovsky, 
Andrey 
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

[CAUTION: External Email]

Am 15.07.22 um 10:43 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

Dma_fence_signal returning non-zero indicates that the fence is
signaled and put somewhere else.
Skip dma_fence_put to make the fence refcount correct.

Well quite a big NAK on this.

Reference counting should be completely independent where a fence signals.

Andrey can you take a look at this as well?

Thanks,
Christian.


Signed-off-by: Jiadong.Zhu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index f4ed0785d523..93c1a5e83835 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd

Re: [PATCH 10/10] drm/amdgpu: add gang submit frontend v2

2022-07-14 Thread Andrey Grodzovsky

Acked-by: Andrey Grodzovsky 

Andrey

On 2022-07-14 06:39, Christian König wrote:

Allows submitting jobs as gang which needs to run on multiple engines at the
same time.

All members of the gang get the same implicit, explicit and VM dependencies. So
no gang member will start running until everything else is ready.

The last job is considered the gang leader (usually a submission to the GFX
ring) and used for signaling output dependencies.

Each job is remembered individually as user of a buffer object, so there is no
joining of work at the end.

v2: rebase and fix review comments from Andrey and Yogesh

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 256 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h|  10 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h |  12 +-
  3 files changed, 183 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 88f491dc7ca2..e1c41db20efb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -69,6 +69,7 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
   unsigned int *num_ibs)
  {
struct drm_sched_entity *entity;
+   unsigned int i;
int r;
  
  	r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,

@@ -77,17 +78,28 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
if (r)
return r;
  
-	/* Abort if there is no run queue associated with this entity.

-* Possibly because of disabled HW IP*/
+   /*
+* Abort if there is no run queue associated with this entity.
+* Possibly because of disabled HW IP.
+*/
if (entity->rq == NULL)
return -EINVAL;
  
-	/* Currently we don't support submitting to multiple entities */

-   if (p->entity && p->entity != entity)
+   /* Check if we can add this IB to some existing job */
+   for (i = 0; i < p->gang_size; ++i) {
+   if (p->entities[i] == entity)
+   goto found;
+   }
+
+   /* If not increase the gang size if possible */
+   if (i == AMDGPU_CS_GANG_SIZE)
return -EINVAL;
  
-	p->entity = entity;

-   ++(*num_ibs);
+   p->entities[i] = entity;
+   p->gang_size = i + 1;
+
+found:
+   ++(num_ibs[i]);
return 0;
  }
  
@@ -161,11 +173,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,

   union drm_amdgpu_cs *cs)
  {
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+   unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { };
struct amdgpu_vm *vm = >vm;
uint64_t *chunk_array_user;
uint64_t *chunk_array;
-   unsigned size, num_ibs = 0;
uint32_t uf_offset = 0;
+   unsigned int size;
int ret;
int i;
  
@@ -231,7 +244,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,

if (size < sizeof(struct drm_amdgpu_cs_chunk_ib))
goto free_partial_kdata;
  
-			ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, _ibs);

+   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs);
if (ret)
goto free_partial_kdata;
break;
@@ -268,21 +281,28 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
}
}
  
-	ret = amdgpu_job_alloc(p->adev, num_ibs, >job, vm);

-   if (ret)
-   goto free_all_kdata;
+   if (!p->gang_size)
+   return -EINVAL;
  
-	ret = drm_sched_job_init(>job->base, p->entity, >vm);

-   if (ret)
-   goto free_all_kdata;
+   for (i = 0; i < p->gang_size; ++i) {
+   ret = amdgpu_job_alloc(p->adev, num_ibs[i], >jobs[i], vm);
+   if (ret)
+   goto free_all_kdata;
+
+   ret = drm_sched_job_init(>jobs[i]->base, p->entities[i],
+>vm);
+   if (ret)
+   goto free_all_kdata;
+   }
+   p->gang_leader = p->jobs[p->gang_size - 1];
  
-	if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {

+   if (p->ctx->vram_lost_counter != p->gang_leader->vram_lost_counter) {
ret = -ECANCELED;
goto free_all_kdata;
}
  
  	if (p->uf_entry.tv.bo)

-   p->job->uf_addr = uf_offset;
+   p->gang_leader->uf_addr = uf_offset;
kvfree(chunk_array);
  
  	/* Use this opportunity to fill in task info for the vm */

@@ -304,22 +324,18 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
return ret;
  }
  
-static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,

-  

Re: [PATCH 07/10] drm/amdgpu: move setting the job resources

2022-07-14 Thread Andrey Grodzovsky

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2022-07-14 06:38, Christian König wrote:

Move setting the job resources into amdgpu_job.c

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  2 ++
  3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index dfb7b4f46bc3..88f491dc7ca2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -828,9 +828,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
struct amdgpu_vm *vm = >vm;
struct amdgpu_bo_list_entry *e;
struct list_head duplicates;
-   struct amdgpu_bo *gds;
-   struct amdgpu_bo *gws;
-   struct amdgpu_bo *oa;
int r;
  
  	INIT_LIST_HEAD(>validated);

@@ -947,22 +944,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
 p->bytes_moved_vis);
  
-	gds = p->bo_list->gds_obj;

-   gws = p->bo_list->gws_obj;
-   oa = p->bo_list->oa_obj;
-
-   if (gds) {
-   p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
-   p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
-   }
-   if (gws) {
-   p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
-   p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
-   }
-   if (oa) {
-   p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
-   p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
-   }
+   amdgpu_job_set_resources(p->job, p->bo_list->gds_obj,
+p->bo_list->gws_obj, p->bo_list->oa_obj);
  
  	if (p->uf_entry.tv.bo) {

struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 36c1be77bf8f..3255b2fca611 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -129,6 +129,23 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, 
unsigned size,
return r;
  }
  
+void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,

+ struct amdgpu_bo *gws, struct amdgpu_bo *oa)
+{
+   if (gds) {
+   job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
+   job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
+   }
+   if (gws) {
+   job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
+   job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
+   }
+   if (oa) {
+   job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
+   job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
+   }
+}
+
  void amdgpu_job_free_resources(struct amdgpu_job *job)
  {
struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index d599c0540b46..0bab8fe0d419 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -77,6 +77,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
 struct amdgpu_job **job, struct amdgpu_vm *vm);
  int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
enum amdgpu_ib_pool_type pool, struct amdgpu_job **job);
+void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,
+ struct amdgpu_bo *gws, struct amdgpu_bo *oa);
  void amdgpu_job_free_resources(struct amdgpu_job *job);
  void amdgpu_job_free(struct amdgpu_job *job);
  int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,


Re: [PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq

2022-07-14 Thread Andrey Grodzovsky

Found the new use case from the 5/10 of reordering CS ioctl.

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2022-07-14 12:26, Christian König wrote:

We need this for limiting codecs like AV1 to the first instance for VCN3.

Essentially the idea is that we first initialize the job with entity, 
id etc... and before we submit it we select a new rq for the entity. 
In the meantime the VCN3 inline parse will have modified the available 
rqs for the entity.


See the patch "revert "fix limiting AV1 to the first instance on 
VCN3"" as well.


Christian.

Am 14.07.22 um 17:43 schrieb Andrey Grodzovsky:
Can you please remind me of the use case that requires this ? I 
browsed through
related mails in the past but haven't found when is that needed. For 
amdgpu

drm_sched_job_init and drm_sched_job_arm are called together and amdgpu
is the only one who supports modifying entity priority on the fly as 
far as i see.


Andrey

On 2022-07-14 06:38, Christian König wrote:
We already discussed that the call to drm_sched_entity_select_rq() 
needs

to move to drm_sched_job_arm() to be able to set a new scheduler list
between _init() and _arm(). This was just not applied for some reason.

Signed-off-by: Christian König 
CC: Andrey Grodzovsky 
CC: dri-de...@lists.freedesktop.org
---
  drivers/gpu/drm/scheduler/sched_main.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 68317d3a7a27..e0ab14e0fb6b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -592,7 +592,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
 struct drm_sched_entity *entity,
 void *owner)
  {
-    drm_sched_entity_select_rq(entity);
  if (!entity->rq)
  return -ENOENT;
  @@ -628,7 +627,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
  struct drm_sched_entity *entity = job->entity;
    BUG_ON(!entity);
-
+    drm_sched_entity_select_rq(entity);
  sched = entity->rq->sched;
    job->sched = sched;




Re: [PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq

2022-07-14 Thread Andrey Grodzovsky
Can you please remind me of the use case that requires this ? I browsed 
through

related mails in the past but haven't found when is that needed. For amdgpu
drm_sched_job_init and drm_sched_job_arm are called together and amdgpu
is the only one who supports modifying entity priority on the fly as far 
as i see.


Andrey

On 2022-07-14 06:38, Christian König wrote:

We already discussed that the call to drm_sched_entity_select_rq() needs
to move to drm_sched_job_arm() to be able to set a new scheduler list
between _init() and _arm(). This was just not applied for some reason.

Signed-off-by: Christian König 
CC: Andrey Grodzovsky 
CC: dri-de...@lists.freedesktop.org
---
  drivers/gpu/drm/scheduler/sched_main.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..e0ab14e0fb6b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -592,7 +592,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
   struct drm_sched_entity *entity,
   void *owner)
  {
-   drm_sched_entity_select_rq(entity);
if (!entity->rq)
return -ENOENT;
  
@@ -628,7 +627,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)

struct drm_sched_entity *entity = job->entity;
  
  	BUG_ON(!entity);

-
+   drm_sched_entity_select_rq(entity);
sched = entity->rq->sched;
  
  	job->sched = sched;


Re: [PATCH 09/10] drm/amdgpu: add gang submit backend

2022-07-14 Thread Andrey Grodzovsky



On 2022-07-14 06:39, Christian König wrote:

Allows submitting jobs as gang which needs to run on multiple
engines at the same time.

Basic idea is that we have a global gang submit fence representing when the
gang leader is finally pushed to run on the hardware last.

Jobs submitted as gang are never re-submitted in case of a GPU reset since this
won't work and will just deadlock the hardware immediately again.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 28 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|  3 ++
  4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2871a3e3801f..19308db52984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -881,6 +881,7 @@ struct amdgpu_device {
u64 fence_context;
unsignednum_rings;
struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
+   struct dma_fence __rcu  *gang_submit;
boolib_pool_ready;
struct amdgpu_sa_managerib_pools[AMDGPU_IB_POOL_MAX];
struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];
@@ -1288,6 +1289,8 @@ u32 amdgpu_device_pcie_port_rreg(struct amdgpu_device 
*adev,
u32 reg);
  void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev,
u32 reg, u32 v);
+struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
+   struct dma_fence *gang);
  
  /* atpx handler */

  #if defined(CONFIG_VGA_SWITCHEROO)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e1c9587f659b..f80beb7208c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3499,6 +3499,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
adev->gmc.gart_size = 512 * 1024 * 1024;
adev->accel_working = false;
adev->num_rings = 0;
+   RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub());
adev->mman.buffer_funcs = NULL;
adev->mman.buffer_funcs_ring = NULL;
adev->vm_manager.vm_pte_funcs = NULL;
@@ -3979,6 +3980,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
release_firmware(adev->firmware.gpu_info_fw);
adev->firmware.gpu_info_fw = NULL;
adev->accel_working = false;
+   dma_fence_put(rcu_dereference_protected(adev->gang_submit, true));
  
  	amdgpu_reset_fini(adev);
  
@@ -5905,3 +5907,35 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev,

(void)RREG32(data);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
  }
+
+/**
+ * amdgpu_device_switch_gang - switch to a new gang
+ * @adev: amdgpu_device pointer
+ * @gang: the gang to switch to
+ *
+ * Try to switch to a new gang or return a reference to the current gang if 
that
+ * isn't possible.
+ * Returns: Either NULL if we switched correctly or a reference to the existing
+ * gang.
+ */
+struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
+   struct dma_fence *gang)
+{
+   struct dma_fence *old = NULL;
+
+   do {
+   dma_fence_put(old);
+   old = dma_fence_get_rcu_safe(>gang_submit);
+
+   if (old == gang)
+   break;
+
+   if (!dma_fence_is_signaled(old))
+   return old;
+
+   } while (cmpxchg((struct dma_fence __force **)>gang_submit,
+old, gang) != old);
+
+   dma_fence_put(old);
+   return NULL;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 3255b2fca611..f3a1fdbd41a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -180,11 +180,29 @@ static void amdgpu_job_free_cb(struct drm_sched_job 
*s_job)
kfree(job);
  }
  
+void amdgpu_job_set_gang_leader(struct amdgpu_job *job,

+   struct amdgpu_job *leader)
+{
+   struct dma_fence *fence = >base.s_fence->scheduled;
+
+   WARN_ON(job->gang_submit);
+
+   /*
+* Don't add a reference when we are the gang leader to avoid circle
+* dependency.
+*/
+   if (job != leader)
+   dma_fence_get(fence);
+   job->gang_submit = fence;
+}
+
  void amdgpu_job_free(struct amdgpu_job *job)
  {
amdgpu_job_free_resources(job);
amdgpu_sync_free(>sync);
amdgpu_sync_free(>sched_sync);
+   if (job->gang_submit != >base.s_fence->scheduled)
+   

Re: [PATCH] drm/amdgpu: Get rid of amdgpu_job->external_hw_fence

2022-07-13 Thread Andrey Grodzovsky



On 2022-07-13 13:33, Christian König wrote:

Am 13.07.22 um 19:13 schrieb Andrey Grodzovsky:

This is a follow-up cleanup to [1]. See bellow refcount balancing
for calling amdgpu_job_submit_direct after this cleanup as far
as I calculated.

amdgpu_fence_emit
dma_fence_init 1
dma_fence_get(fence) 2
rcu_assign_pointer(*ptr, dma_fence_get(fence) 3

---> amdgpu_job_submit_direct completes before fence signaled
    amdgpu_sa_bo_free
    (*sa_bo)->fence = dma_fence_get(fence) 4

    amdgpu_job_free
    dma_fence_put 3

    amdgpu_vcn_enc_get_destroy_msg
    *fence = dma_fence_get(f) 4
    dma_fence_put(f); 3

    amdgpu_vcn_enc_ring_test_ib
    dma_fence_put(fence) 2

    amdgpu_fence_process
    dma_fence_put 1

    amdgpu_sa_bo_remove_locked
    dma_fence_put 0

---> amdgpu_job_submit_direct completes after fence signaled
    amdgpu_fence_process
    dma_fence_put 2

    amdgpu_job_free
    dma_fence_put 1

    amdgpu_vcn_enc_get_destroy_msg
    *fence = dma_fence_get(f) 2
    dma_fence_put(f); 1

    amdgpu_vcn_enc_ring_test_ib
    dma_fence_put(fence) 0

[1] - 
https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzov...@amd.com/


Signed-off-by: Andrey Grodzovsky 
Suggested-by: Christian König 


Of hand that looks correct to me, but could be that I'm missing 
something as well.


Anyway I think I can give an Reviewed-by: Christian König 
 for this.


Thanks,
Christian.



Pushed, thanks.

Andrey





---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 27 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h    |  1 -
  3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 16faea7ed1cd..b79ee4ffb879 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5229,8 +5229,7 @@ int amdgpu_device_gpu_recover(struct 
amdgpu_device *adev,

   *
   * job->base holds a reference to parent fence
   */
-    if (job && (job->hw_fence.ops != NULL) &&
-    dma_fence_is_signaled(>hw_fence)) {
+    if (job && dma_fence_is_signaled(>hw_fence)) {
  job_signaled = true;
  dev_info(adev->dev, "Guilty job already signaled, skipping 
HW reset");

  goto skip_hw_reset;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 6fa381ee5fa0..10fdd12cf853 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -134,16 +134,10 @@ void amdgpu_job_free_resources(struct 
amdgpu_job *job)

  {
  struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched);
  struct dma_fence *f;
-    struct dma_fence *hw_fence;
  unsigned i;
  -    if (job->hw_fence.ops == NULL)
-    hw_fence = job->external_hw_fence;
-    else
-    hw_fence = >hw_fence;
-
  /* use sched fence if available */
-    f = job->base.s_fence ? >base.s_fence->finished : hw_fence;
+    f = job->base.s_fence ? >base.s_fence->finished :  
>hw_fence;

  for (i = 0; i < job->num_ibs; ++i)
  amdgpu_ib_free(ring->adev, >ibs[i], f);
  }
@@ -157,11 +151,7 @@ static void amdgpu_job_free_cb(struct 
drm_sched_job *s_job)

  amdgpu_sync_free(>sync);
  amdgpu_sync_free(>sched_sync);
  -    /* only put the hw fence if has embedded fence */
-    if (job->hw_fence.ops != NULL)
-    dma_fence_put(>hw_fence);
-    else
-    kfree(job);
+    dma_fence_put(>hw_fence);
  }
    void amdgpu_job_free(struct amdgpu_job *job)
@@ -170,11 +160,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
  amdgpu_sync_free(>sync);
  amdgpu_sync_free(>sched_sync);
  -    /* only put the hw fence if has embedded fence */
-    if (job->hw_fence.ops != NULL)
-    dma_fence_put(>hw_fence);
-    else
-    kfree(job);
+    dma_fence_put(>hw_fence);
  }
    int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,
@@ -204,15 +190,12 @@ int amdgpu_job_submit_direct(struct amdgpu_job 
*job, struct amdgpu_ring *ring,

  int r;
    job->base.sched = >sched;
-    r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, NULL, fence);
-    /* record external_hw_fence for direct submit */
-    job->external_hw_fence = dma_fence_get(*fence);
+    r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job, fence);
+
  if (r)
  return r;
    amdgpu_job_free(job);
-    dma_fence_put(*fence);
-
  return 0;
  }
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 

[PATCH] drm/amdgpu: Get rid of amdgpu_job->external_hw_fence

2022-07-13 Thread Andrey Grodzovsky
This is a follow-up cleanup to [1]. See bellow refcount balancing
for calling amdgpu_job_submit_direct after this cleanup as far
as I calculated.

amdgpu_fence_emit
dma_fence_init 1
dma_fence_get(fence) 2
rcu_assign_pointer(*ptr, dma_fence_get(fence) 3

---> amdgpu_job_submit_direct completes before fence signaled
amdgpu_sa_bo_free
(*sa_bo)->fence = dma_fence_get(fence) 4

amdgpu_job_free
dma_fence_put 3

amdgpu_vcn_enc_get_destroy_msg
*fence = dma_fence_get(f) 4
dma_fence_put(f); 3

amdgpu_vcn_enc_ring_test_ib
dma_fence_put(fence) 2

amdgpu_fence_process
dma_fence_put 1

amdgpu_sa_bo_remove_locked
dma_fence_put 0

---> amdgpu_job_submit_direct completes after fence signaled
amdgpu_fence_process
dma_fence_put 2

amdgpu_job_free
dma_fence_put 1

amdgpu_vcn_enc_get_destroy_msg
*fence = dma_fence_get(f) 2
dma_fence_put(f); 1

amdgpu_vcn_enc_ring_test_ib
dma_fence_put(fence) 0

[1] - 
https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzov...@amd.com/

Signed-off-by: Andrey Grodzovsky 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 27 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|  1 -
 3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 16faea7ed1cd..b79ee4ffb879 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5229,8 +5229,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 *
 * job->base holds a reference to parent fence
 */
-   if (job && (job->hw_fence.ops != NULL) &&
-   dma_fence_is_signaled(>hw_fence)) {
+   if (job && dma_fence_is_signaled(>hw_fence)) {
job_signaled = true;
dev_info(adev->dev, "Guilty job already signaled, skipping HW 
reset");
goto skip_hw_reset;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 6fa381ee5fa0..10fdd12cf853 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -134,16 +134,10 @@ void amdgpu_job_free_resources(struct amdgpu_job *job)
 {
struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched);
struct dma_fence *f;
-   struct dma_fence *hw_fence;
unsigned i;
 
-   if (job->hw_fence.ops == NULL)
-   hw_fence = job->external_hw_fence;
-   else
-   hw_fence = >hw_fence;
-
/* use sched fence if available */
-   f = job->base.s_fence ? >base.s_fence->finished : hw_fence;
+   f = job->base.s_fence ? >base.s_fence->finished :  >hw_fence;
for (i = 0; i < job->num_ibs; ++i)
amdgpu_ib_free(ring->adev, >ibs[i], f);
 }
@@ -157,11 +151,7 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
amdgpu_sync_free(>sync);
amdgpu_sync_free(>sched_sync);
 
-/* only put the hw fence if has embedded fence */
-   if (job->hw_fence.ops != NULL)
-   dma_fence_put(>hw_fence);
-   else
-   kfree(job);
+   dma_fence_put(>hw_fence);
 }
 
 void amdgpu_job_free(struct amdgpu_job *job)
@@ -170,11 +160,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
amdgpu_sync_free(>sync);
amdgpu_sync_free(>sched_sync);
 
-   /* only put the hw fence if has embedded fence */
-   if (job->hw_fence.ops != NULL)
-   dma_fence_put(>hw_fence);
-   else
-   kfree(job);
+   dma_fence_put(>hw_fence);
 }
 
 int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
@@ -204,15 +190,12 @@ int amdgpu_job_submit_direct(struct amdgpu_job *job, 
struct amdgpu_ring *ring,
int r;
 
job->base.sched = >sched;
-   r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, NULL, fence);
-   /* record external_hw_fence for direct submit */
-   job->external_hw_fence = dma_fence_get(*fence);
+   r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job, fence);
+
if (r)

[PATCH v2 2/4] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-24 Thread Andrey Grodzovsky
Problem:
After we start handling timed out jobs we assume there fences won't be
signaled but we cannot be sure and sometimes they fire late. We need
to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process
from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.

v2: Switch from irq_get/put to full enable/disable_irq for amdgpu

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 18 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
 3 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index eacecc672a4d..03519d58e630 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_virt_fini_data_exchange(adev);
}
 
+   amdgpu_fence_driver_isr_toggle(adev, true);
+
/* block all schedulers and reset given job's ring */
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_fence_driver_force_completion(ring);
}
 
+   amdgpu_fence_driver_isr_toggle(adev, false);
+
if (job && job->vm)
drm_sched_increase_karma(>base);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index a9ae3beaa1d3..c1d04ea3c67f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -532,6 +532,24 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device 
*adev)
}
 }
 
+/* Will either stop and flush handlers for amdgpu interrupt or reanble it */
+void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop)
+{
+   int i;
+
+   for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+   struct amdgpu_ring *ring = adev->rings[i];
+
+   if (!ring || !ring->fence_drv.initialized || 
!ring->fence_drv.irq_src)
+   continue;
+
+   if (stop)
+   disable_irq(adev->irq.irq);
+   else
+   enable_irq(adev->irq.irq);
+   }
+}
+
 void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
 {
unsigned int i, j;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7d89a52091c0..82c178a9033a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct amdgpu_ring 
*ring,
  uint32_t wait_seq,
  signed long timeout);
 unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring);
+void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop);
 
 /*
  * Rings.
-- 
2.25.1



[PATCH v2 4/4] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-24 Thread Andrey Grodzovsky
Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount balanced.

Also since in the previous patch we resumed setting s_fence->parent to NULL
in drm_sched_stop switch to directly checking if job->hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  4 
 4 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 44da025502ac..567597469a8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
goto err_ib_sched;
}
 
+   /* Drop the initial kref_init count (see drm_sched_main as example) */
+   dma_fence_put(f);
ret = dma_fence_wait(f, false);
 
 err_ib_sched:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 03519d58e630..a2c268d48edd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5009,16 +5009,32 @@ static void amdgpu_device_recheck_guilty_jobs(
 
/* clear job's guilty and depend the folowing step to decide 
the real one */
drm_sched_reset_karma(s_job);
-   /* for the real bad job, it will be resubmitted twice, adding a 
dma_fence_get
-* to make sure fence is balanced */
-   dma_fence_get(s_job->s_fence->parent);
drm_sched_resubmit_jobs_ext(>sched, 1);
 
+   if (!s_job->s_fence->parent) {
+   DRM_WARN("Failed to get a HW fence for job!");
+   continue;
+   }
+
ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, 
ring->sched.timeout);
if (ret == 0) { /* timeout */
DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",
ring->sched.name, s_job->id);
 
+
+   amdgpu_fence_driver_isr_toggle(adev, true);
+
+   /* Clear this failed job from fence array */
+   amdgpu_fence_driver_clear_job_fences(ring);
+
+   amdgpu_fence_driver_isr_toggle(adev, false);
+
+   /* Since the job won't signal and we go for
+* another resubmit drop this parent pointer
+*/
+   dma_fence_put(s_job->s_fence->parent);
+   s_job->s_fence->parent = NULL;
+
/* set guilty */
drm_sched_increase_karma(s_job);
 retry:
@@ -5047,7 +5063,6 @@ static void amdgpu_device_recheck_guilty_jobs(
 
/* got the hw fence, signal finished fence */
atomic_dec(ring->sched.score);
-   dma_fence_put(s_job->s_fence->parent);
dma_fence_get(_job->s_fence->finished);
dma_fence_signal(_job->s_fence->finished);
dma_fence_put(_job->s_fence->finished);
@@ -5220,8 +5235,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 *
 * job->base holds a reference to parent fence
 */
-   if (job && job->base.s_fence->parent &&
-   dma_fence_is_signaled(job->base.s_fence->parent)) {
+   if (job && (job->hw_fence.ops != NULL) &&
+   dma_fence_is_signaled(>hw_fence)) {
job_signaled = true;
dev_info(adev->dev, "Guilty job already signaled, skipping HW 
reset");
goto skip_hw_reset;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index c1d04ea3c67f..39597ab807d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,11 +164,16 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct 
dma_fence **f, struct amd
if (job && job->job_run_counter) {
/* reinit seq for resubmitted jobs */
fence->seqno = seq;
+   /* TO be inline with external fence creation and other drivers 
*/
+   dma_fence_get(fence);
} else {
-   if (job)
+   if (job) {
  

[PATCH v2 1/4] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences

2022-06-24 Thread Andrey Grodzovsky
This function should drop the fence refcount when it extracts the
fence from the fence array, just as it's done in amdgpu_fence_process.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 957437a5558c..a9ae3beaa1d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -595,8 +595,10 @@ void amdgpu_fence_driver_clear_job_fences(struct 
amdgpu_ring *ring)
for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
ptr = >fence_drv.fences[i];
old = rcu_dereference_protected(*ptr, 1);
-   if (old && old->ops == _job_fence_ops)
+   if (old && old->ops == _job_fence_ops) {
RCU_INIT_POINTER(*ptr, NULL);
+   dma_fence_put(old);
+   }
}
 }
 
-- 
2.25.1



[PATCH v2 3/4] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'

2022-06-24 Thread Andrey Grodzovsky
Problem:
This patch caused negative refcount as described in [1] because
for that case parent fence did not signal by the time of drm_sched_stop and 
hence
kept in pending list the assumption was they will not signal and
so fence was put to account for the s_fence->parent refcount but for
amdgpu which has embedded HW fence (always same parent fence)
drm_sched_fence_release_scheduled was always called and would
still drop the count for parent fence once more. For jobs that
never signaled this imbalance was masked by refcount bug in
amdgpu_fence_driver_clear_job_fences that would not drop
refcount on the fences that were removed from fence drive
fences array (against prevois insertion into the array in
get in amdgpu_fence_emit).

Fix:
Revert this patch and by setting s_job->s_fence->parent to NULL
as before prevent the extra refcount drop in amdgpu when
drm_sched_fence_release_scheduled is called on job release.

Also - align behaviour in drm_sched_resubmit_jobs_ext with that of
drm_sched_main when submitting jobs - take a refcount for the
new parent fence pointer and drop refcount for original kref_init
for new HW fence creation (or fake new HW fence in amdgpu - see next patch).

[1] - 
https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
 drivers/gpu/drm/scheduler/sched_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b81fceb0b8a2..c5437ee03e3f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -419,6 +419,8 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
drm_sched_job *bad)
if (s_job->s_fence->parent &&
dma_fence_remove_callback(s_job->s_fence->parent,
  _job->cb)) {
+   dma_fence_put(s_job->s_fence->parent);
+   s_job->s_fence->parent = NULL;
atomic_dec(>hw_rq_count);
} else {
/*
@@ -548,7 +550,6 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler 
*sched, int max)
if (found_guilty && s_job->s_fence->scheduled.context == 
guilty_context)
dma_fence_set_error(_fence->finished, -ECANCELED);
 
-   dma_fence_put(s_job->s_fence->parent);
fence = sched->ops->run_job(s_job);
i++;
 
@@ -558,7 +559,11 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler 
*sched, int max)
 
s_job->s_fence->parent = NULL;
} else {
-   s_job->s_fence->parent = fence;
+
+   s_job->s_fence->parent = dma_fence_get(fence);
+
+   /* Drop for orignal kref_init */
+   dma_fence_put(fence);
}
}
 }
@@ -952,6 +957,9 @@ static int drm_sched_main(void *param)
 
if (!IS_ERR_OR_NULL(fence)) {
s_fence->parent = dma_fence_get(fence);
+   /* Drop for original kref_init of the fence */
+   dma_fence_put(fence);
+
r = dma_fence_add_callback(fence, _job->cb,
   drm_sched_job_done_cb);
if (r == -ENOENT)
@@ -959,7 +967,6 @@ static int drm_sched_main(void *param)
else if (r)
DRM_DEV_ERROR(sched->dev, "fence add callback 
failed (%d)\n",
  r);
-   dma_fence_put(fence);
} else {
if (IS_ERR(fence))
dma_fence_set_error(_fence->finished, 
PTR_ERR(fence));
-- 
2.25.1



[PATCH v2 0/4] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.

2022-06-24 Thread Andrey Grodzovsky
Yiqing raised a problem of negative fence refcount for resubmitted jobs
in amdgpu and suggested a workaround in [1]. I took  a look myself and 
discovered
some deeper problems both in amdgpu and scheduler code.

Yiqing helped with testing the new code and also drew a detailed refcount and 
flow
tracing diagram for parent (HW) fence life cycle and refcount under various
cases for the proposed patchset at [2].

v2:
Update race preventionby fixing by swithing from amdgpu_irq_get/put to 
enable/desable_irq (Christian)
Drop refcount fix for amdgpu_job->external_hw_fence as it was causing underflow 
in direct submissions

TODO - Follow up cleanup to totally get rid of amdgpu_job->external_hw_fence 

[1] - 
https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3
[2] - 
https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing


Andrey Grodzovsky (4):
  drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
  drm/amdgpu: Prevent race between late signaled fences and GPU reset.
  drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'
  drm/amdgpu: Follow up change to previous drm scheduler change.

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 29 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  4 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
 drivers/gpu/drm/scheduler/sched_main.c | 13 ++---
 6 files changed, 65 insertions(+), 15 deletions(-)

-- 
2.25.1



Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-23 Thread Andrey Grodzovsky



On 2022-06-22 11:04, Christian König wrote:

Am 22.06.22 um 17:01 schrieb Andrey Grodzovsky:


On 2022-06-22 05:00, Christian König wrote:

Am 21.06.22 um 21:34 schrieb Andrey Grodzovsky:

On 2022-06-21 03:19, Christian König wrote:


Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky:

Problem:
In amdgpu_job_submit_direct - The refcount should drop by 2
but it drops only by 1.

amdgpu_ib_sched->emit -> refcount 1 from first fence init
dma_fence_get -> refcount 2
dme_fence_put -> refcount 1

Fix:
Add put for external_hw_fence in amdgpu_job_free/free_cb


Well what is the external_hw_fence good for in this construct?



As far as I understand for direct submissions you don't want to 
pass a job
pointer to ib_schedule and so u can't use the embedded fence for 
this case.


Can you please look a bit deeper into this, we now have a couple of 
fields in the job structure which have no obvious use.


I think we could pass a job structure to ib_schedule even for direct 
submit now.



Are you sure  ? I see a lot of activities in amdgpu_ib_schedule 
depend on presence of  vm and fence_ctx which are set if the job 
pointer argument != NULL, might this have a negative impact on direct 
submit ?


Not 100% sure, but we did tons of workarounds because we didn't had a 
job pointer for direct submit.


But this was before we embedded the IBs at the end of the job.

It's quite likely that this should be possible now, it's just that 
somebody needs to double check.


Christian.



Looking more i see stuff like amdgpu_vm_flush and 
amdgpu_ring_emit_cntxcntl, emit_frame_cntl that are conditioned on job 
argument, doesn't look to me like this is relevant to direct submit ?


I also noticed that direct submit passes back the created fence to it's 
caller while freeing the job immediately, Using embedded job here will 
increase the time the job object will hang around the memory
without any use as long as it's fence is referenced. Job object is much 
larger then a single fence.


Andrey






Andrey




Regards,
Christian.



Andrey






Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 10aa073600d4..58568fdde2d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct 
drm_sched_job *s_job)

  /* only put the hw fence if has embedded fence */
  if (job->hw_fence.ops != NULL)
  dma_fence_put(>hw_fence);
-    else
+    else {


When one side of the if uses {} the other side should use {} as 
well, e.g. use } else { here.


Christian.


+ dma_fence_put(job->external_hw_fence);
  kfree(job);
+    }
  }
    void amdgpu_job_free(struct amdgpu_job *job)
@@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job)
  /* only put the hw fence if has embedded fence */
  if (job->hw_fence.ops != NULL)
  dma_fence_put(>hw_fence);
-    else
+    else {
+    dma_fence_put(job->external_hw_fence);
  kfree(job);
+    }
  }
    int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,








Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-23 Thread Andrey Grodzovsky



On 2022-06-23 01:52, Christian König wrote:

Am 22.06.22 um 19:19 schrieb Andrey Grodzovsky:


On 2022-06-22 03:17, Christian König wrote:

Am 21.06.22 um 22:00 schrieb Andrey Grodzovsky:


On 2022-06-21 03:28, Christian König wrote:

Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky:

Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount 
balanced.


Could we now also remove job_run_counter?

Christian.



I am afraid not, job counter is needed since at all times the 
refcount on the
embedded fence cannot drop to zero because this will free the job 
itself before
the end of it's life cycle. We have to be able to differentiate in 
amdgpu_fence_emit
between first ever call where we init the embedded fence's refcount 
from scratch using kref_init
to repeating calls when refcount already > 0 and we just fake 
increase the refcount to align

behavior with pointer style fences in other drivers.


Well what we should probably rather do is move the init out of emit 
instead.


The only down side I can see is that the sequence number isn't know 
on initial init and so needs to be zero or something like that.


Regards,
Christian.



Not sure how this help, the problem is not there but in 
amdgpu_job_run, for embedded fence and resubmit job in pending list 
amdgpu_job_run
will be called twice or even 3 times with recheck guilty job 
sequence. I am supposed to do dma_fence_init to embeded HW fence only 
on first call while on second and
third only update sequence_num and increase refcount. How can i 
differentiate between first and non first calls without 
job_run_counter ?


Yeah, good point. We should really stop re-submitting jobs altogether 
in the kernel and move that whole functionality into userspace.


Christian.



So i guess we keep this for now and see how to move resubmit 
functionality to user space ? as a separate task ?


Andrey






Andrey






I guess we could assume that embedded fence is all zeroes before 
first dma_fence_init  if assuming the job itself
was allocated using kzalloc and so u can look at dma_fence_ops == 
NULL or maybe seqno == 0
as a hint if that the fist call or not but it's a risky assumption 
in my opinion.


Andrey






Also since in the previous patch we resumed setting 
s_fence->parent to NULL

in drm_sched_stop switch to directly checking if job->hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 
--

  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  7 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  4 
  4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

index 513c57f839d8..447bd92c4856 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct 
amdgpu_device *adev,

  goto err_ib_sched;
  }
  +    /* Drop the initial kref_init count (see drm_sched_main as 
example) */

+    dma_fence_put(f);
  ret = dma_fence_wait(f, false);
    err_ib_sched:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index c99541685804..f9718119834f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5009,16 +5009,28 @@ static void 
amdgpu_device_recheck_guilty_jobs(
    /* clear job's guilty and depend the folowing step to 
decide the real one */

  drm_sched_reset_karma(s_job);
-    /* for the real bad job, it will be resubmitted twice, 
adding a dma_fence_get

- * to make sure fence is balanced */
-    dma_fence_get(s_job->s_fence->parent);
drm_sched_resubmit_jobs_ext(>sched, 1);
  +    if (!s_job->s_fence->parent) {
+    DRM_WARN("Failed to get a HW fence for job!");
+    continue;
+    }
+
  ret = dma_fence_wait_timeout(s_job->s_fence->parent, 
false, ring->sched.timeout);

  if (ret == 0) { /* timeout */
  DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",

  ring->sched.name, s_job->id);
  +
+    /* Clear this failed job from fence array */
+    amdgpu_fence_driver_clear_job_fences(ring);
+
+    /* Since the job won't signal and we go for
+ * another resubmit drop this parent pointer
+ */
+ dma_fence_put(s_job->s_fence->parent);
+    s_job->s_fence->parent = NULL;
+
  /* set guilty */
  drm_

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-22 Thread Andrey Grodzovsky

Just a ping

Andrey

On 2022-06-21 15:45, Andrey Grodzovsky wrote:


On 2022-06-21 03:25, Christian König wrote:

Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky:

Problem:
After we start handling timed out jobs we assume there fences won't be
signaled but we cannot be sure and sometimes they fire late. We need
to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and 
amdgpu_fence_process

from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.




Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 26 
++

  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
  3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 2b92281dd0c1..c99541685804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

  amdgpu_virt_fini_data_exchange(adev);
  }
  +    amdgpu_fence_driver_isr_toggle(adev, true);
+
  /* block all schedulers and reset given job's ring */
  for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
  struct amdgpu_ring *ring = adev->rings[i];
@@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

  amdgpu_fence_driver_force_completion(ring);
  }
  +    amdgpu_fence_driver_isr_toggle(adev, false);
+
  if (job && job->vm)
  drm_sched_increase_karma(>base);
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index a9ae3beaa1d3..d6d54ba4c185 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct 
amdgpu_device *adev)

  }
  }
  +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, 
bool stop)

+{
+    int i;
+
+    for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+    struct amdgpu_ring *ring = adev->rings[i];
+
+    if (!ring || !ring->fence_drv.initialized || 
!ring->fence_drv.irq_src)

+    continue;
+
+    if (stop)
+    amdgpu_irq_put(adev, ring->fence_drv.irq_src,
+   ring->fence_drv.irq_type);
+    else
+    amdgpu_irq_get(adev, ring->fence_drv.irq_src,
+    ring->fence_drv.irq_type);


That won't work like this. This increments/decrements the reference 
count for the IRQ, but doesn't guarantee in any way that they are 
stopped/started.



I understand that, i just assumed that the fence driver is the only 
holder of this interrupt source (e.g. regCP_INT_CNTL_RING0) ?
I can disable amdgpu interrupt line totally using disable_irq - would 
this be better ?







+    }
+
+    /* TODO Only waits for irq handlers on other CPUs, maybe 
local_irq_save
+ * local_irq_local_irq_restore are needed here for local 
interrupts ?

+ *
+ */


Well that comment made me smile. Think for a moment what the local 
CPU would be doing if an interrupt would run :)



No, I understand this of course, I am ok to be interrupted by 
interrupt handler at this point, what i am trying to do
is to prevent amdgpu_fence_process to run concurrently with 
amdgpu_fence_driver_clear_job_fences - that is what this
function is trying to prevent - i disable and flush pending EOP ISR 
handlers before the call to clear fences and re-enable after.
I guess we can also introduce a spinlock to serialize them ? Yiqing 
reported seeing a race between them so we have to do something.


Andrey




Cheers,
Christian.


+    if (stop)
+    synchronize_irq(adev->irq.irq);
+}
+
  void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
  {
  unsigned int i, j;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 7d89a52091c0..82c178a9033a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct 
amdgpu_ring *ring,

    uint32_t wait_seq,
    signed long timeout);
  unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring);
+void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, 
bool stop);

    /*
   * Rings.




Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-22 Thread Andrey Grodzovsky



On 2022-06-22 03:17, Christian König wrote:

Am 21.06.22 um 22:00 schrieb Andrey Grodzovsky:


On 2022-06-21 03:28, Christian König wrote:

Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky:

Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount 
balanced.


Could we now also remove job_run_counter?

Christian.



I am afraid not, job counter is needed since at all times the 
refcount on the
embedded fence cannot drop to zero because this will free the job 
itself before
the end of it's life cycle. We have to be able to differentiate in 
amdgpu_fence_emit
between first ever call where we init the embedded fence's refcount 
from scratch using kref_init
to repeating calls when refcount already > 0 and we just fake 
increase the refcount to align

behavior with pointer style fences in other drivers.


Well what we should probably rather do is move the init out of emit 
instead.


The only down side I can see is that the sequence number isn't know on 
initial init and so needs to be zero or something like that.


Regards,
Christian.



Not sure how this help, the problem is not there but in amdgpu_job_run, 
for embedded fence and resubmit job in pending list amdgpu_job_run
will be called twice or even 3 times with recheck guilty job sequence. I 
am supposed to do dma_fence_init to embeded HW fence only on first call 
while on second and
third only update sequence_num and increase refcount. How can i 
differentiate between first and non first calls without job_run_counter ?


Andrey






I guess we could assume that embedded fence is all zeroes before 
first dma_fence_init  if assuming the job itself
was allocated using kzalloc and so u can look at dma_fence_ops == 
NULL or maybe seqno == 0
as a hint if that the fist call or not but it's a risky assumption in 
my opinion.


Andrey






Also since in the previous patch we resumed setting s_fence->parent 
to NULL

in drm_sched_stop switch to directly checking if job->hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 
--

  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  7 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  4 
  4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

index 513c57f839d8..447bd92c4856 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct 
amdgpu_device *adev,

  goto err_ib_sched;
  }
  +    /* Drop the initial kref_init count (see drm_sched_main as 
example) */

+    dma_fence_put(f);
  ret = dma_fence_wait(f, false);
    err_ib_sched:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index c99541685804..f9718119834f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs(
    /* clear job's guilty and depend the folowing step to 
decide the real one */

  drm_sched_reset_karma(s_job);
-    /* for the real bad job, it will be resubmitted twice, 
adding a dma_fence_get

- * to make sure fence is balanced */
-    dma_fence_get(s_job->s_fence->parent);
  drm_sched_resubmit_jobs_ext(>sched, 1);
  +    if (!s_job->s_fence->parent) {
+    DRM_WARN("Failed to get a HW fence for job!");
+    continue;
+    }
+
  ret = dma_fence_wait_timeout(s_job->s_fence->parent, 
false, ring->sched.timeout);

  if (ret == 0) { /* timeout */
  DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",

  ring->sched.name, s_job->id);
  +
+    /* Clear this failed job from fence array */
+    amdgpu_fence_driver_clear_job_fences(ring);
+
+    /* Since the job won't signal and we go for
+ * another resubmit drop this parent pointer
+ */
+    dma_fence_put(s_job->s_fence->parent);
+    s_job->s_fence->parent = NULL;
+
  /* set guilty */
  drm_sched_increase_karma(s_job);
  retry:
@@ -5047,7 +5059,6 @@ static void amdgpu_device_recheck_guilty_jobs(
    /* got the hw fence, signal finished fence */
  atomic_dec(ring->sched.score);
-    dma_fence_put(s_job->s_fence->parent);
dma_fence_get(_job->s_fence->finished);
dma_fence_signal(_job->s_fence->finished);
dma_fe

Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-22 Thread Andrey Grodzovsky



On 2022-06-22 05:00, Christian König wrote:

Am 21.06.22 um 21:34 schrieb Andrey Grodzovsky:

On 2022-06-21 03:19, Christian König wrote:


Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky:

Problem:
In amdgpu_job_submit_direct - The refcount should drop by 2
but it drops only by 1.

amdgpu_ib_sched->emit -> refcount 1 from first fence init
dma_fence_get -> refcount 2
dme_fence_put -> refcount 1

Fix:
Add put for external_hw_fence in amdgpu_job_free/free_cb


Well what is the external_hw_fence good for in this construct?



As far as I understand for direct submissions you don't want to pass 
a job
pointer to ib_schedule and so u can't use the embedded fence for this 
case.


Can you please look a bit deeper into this, we now have a couple of 
fields in the job structure which have no obvious use.


I think we could pass a job structure to ib_schedule even for direct 
submit now.



Are you sure  ? I see a lot of activities in amdgpu_ib_schedule depend 
on presence of  vm and fence_ctx which are set if the job pointer 
argument != NULL, might this have a negative impact on direct submit ?


Andrey




Regards,
Christian.



Andrey






Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 10aa073600d4..58568fdde2d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct 
drm_sched_job *s_job)

  /* only put the hw fence if has embedded fence */
  if (job->hw_fence.ops != NULL)
  dma_fence_put(>hw_fence);
-    else
+    else {


When one side of the if uses {} the other side should use {} as 
well, e.g. use } else { here.


Christian.


+ dma_fence_put(job->external_hw_fence);
  kfree(job);
+    }
  }
    void amdgpu_job_free(struct amdgpu_job *job)
@@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job)
  /* only put the hw fence if has embedded fence */
  if (job->hw_fence.ops != NULL)
  dma_fence_put(>hw_fence);
-    else
+    else {
+    dma_fence_put(job->external_hw_fence);
  kfree(job);
+    }
  }
    int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,






Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Andrey Grodzovsky
You have a job in the pending list which is marked as not finished in 
drm_sched_stop 
(https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/scheduler/sched_main.c#L420), 
s_fence signal cb removed and the job is kept in pending list.
Later you will try to manually clear the HW fence of this job in here 
(https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L4492) 
but the EOP interrupt can fire for that fence exactly at this
moment and you have concurrent access to fence driver's  fence array 
from both amdgpu_process_fence and amdgpu_fence_driver_clear_job_fences 
which is not supposed to happen.Yiqing reported to me a race during 
debugging
we did of the original refcount bug and it looked to me like this 
scenario. Seems to me the EOP ISR handler should be prevented from 
running during this time at least.


Andrey

On 2022-06-21 21:47, VURDIGERENATARAJ, CHANDAN wrote:

Hi,

Is this a preventive fix or you found errors/oops/hangs?
If you had found errors/oops/hangs, can you please share the details?

BR,
Chandan V N



On 2022-06-21 03:25, Christian König wrote:

Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky:

Problem:
After we start handling timed out jobs we assume there fences won't
be signaled but we cannot be sure and sometimes they fire late. We
need to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and
amdgpu_fence_process from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.
Signed-off-by: Andrey Grodzovsky 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 26
++
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
   3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b92281dd0c1..c99541685804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct
amdgpu_device *adev,
   amdgpu_virt_fini_data_exchange(adev);
   }
   +    amdgpu_fence_driver_isr_toggle(adev, true);
+
   /* block all schedulers and reset given job's ring */
   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
   struct amdgpu_ring *ring = adev->rings[i]; @@ -4620,6
+4622,8 @@ int amdgpu_device_pre_asic_reset(struct
amdgpu_device *adev,
   amdgpu_fence_driver_force_completion(ring);
   }
   +    amdgpu_fence_driver_isr_toggle(adev, false);
+
   if (job && job->vm)
   drm_sched_increase_karma(>base);
   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index a9ae3beaa1d3..d6d54ba4c185 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct
amdgpu_device *adev)
   }
   }
   +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev,
bool stop)
+{
+    int i;
+
+    for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+    struct amdgpu_ring *ring = adev->rings[i];
+
+    if (!ring || !ring->fence_drv.initialized ||
!ring->fence_drv.irq_src)
+    continue;
+
+    if (stop)
+    amdgpu_irq_put(adev, ring->fence_drv.irq_src,
+   ring->fence_drv.irq_type);
+    else
+    amdgpu_irq_get(adev, ring->fence_drv.irq_src,
+    ring->fence_drv.irq_type);

That won't work like this. This increments/decrements the reference
count for the IRQ, but doesn't guarantee in any way that they are
stopped/started.


I understand that, i just assumed that the fence driver is the only holder of 
this interrupt source (e.g. regCP_INT_CNTL_RING0) ?
I can disable amdgpu interrupt line totally using disable_irq - would this be 
better ?





+    }
+
+    /* TODO Only waits for irq handlers on other CPUs, maybe
local_irq_save
+ * local_irq_local_irq_restore are needed here for local
interrupts ?
+ *
+ */

Well that comment made me smile. Think for a moment what the local CPU
would be doing if an interrupt would run :)


No, I understand this of course, I am ok to be interrupted by interrupt handler at 
this point, what i am trying to do is to prevent amdgpu_fence_process to run 
concurrently with amdgpu_fence_driver_clear_job_fences - that is what this 
function is trying to prevent - i disable and >flush pending EOP ISR handlers 
before the call to clear fences and re-enable after.
I guess we can also introduce a spinlock to serialize them ? Yiqing reported 
seeing a race between them so we have to do something.

Andrey



Cheers,
Christian.


+    if (stop)
+    synchronize_irq(adev->irq.irq); }
+
   void amdgpu_fence_driver_sw_fin

Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-21 Thread Andrey Grodzovsky



On 2022-06-21 03:28, Christian König wrote:

Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky:

Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount balanced.


Could we now also remove job_run_counter?

Christian.



I am afraid not, job counter is needed since at all times the refcount 
on the
embedded fence cannot drop to zero because this will free the job itself 
before
the end of it's life cycle. We have to be able to differentiate in 
amdgpu_fence_emit
between first ever call where we init the embedded fence's refcount from 
scratch using kref_init
to repeating calls when refcount already > 0 and we just fake increase 
the refcount to align

behavior with pointer style fences in other drivers.

I guess we could assume that embedded fence is all zeroes before first 
dma_fence_init  if assuming the job itself
was allocated using kzalloc and so u can look at dma_fence_ops == NULL 
or maybe seqno == 0
as a hint if that the fist call or not but it's a risky assumption in my 
opinion.


Andrey






Also since in the previous patch we resumed setting s_fence->parent 
to NULL

in drm_sched_stop switch to directly checking if job->hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  7 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  4 
  4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

index 513c57f839d8..447bd92c4856 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device 
*adev,

  goto err_ib_sched;
  }
  +    /* Drop the initial kref_init count (see drm_sched_main as 
example) */

+    dma_fence_put(f);
  ret = dma_fence_wait(f, false);
    err_ib_sched:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index c99541685804..f9718119834f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs(
    /* clear job's guilty and depend the folowing step to 
decide the real one */

  drm_sched_reset_karma(s_job);
-    /* for the real bad job, it will be resubmitted twice, 
adding a dma_fence_get

- * to make sure fence is balanced */
-    dma_fence_get(s_job->s_fence->parent);
  drm_sched_resubmit_jobs_ext(>sched, 1);
  +    if (!s_job->s_fence->parent) {
+    DRM_WARN("Failed to get a HW fence for job!");
+    continue;
+    }
+
  ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, 
ring->sched.timeout);

  if (ret == 0) { /* timeout */
  DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",

  ring->sched.name, s_job->id);
  +
+    /* Clear this failed job from fence array */
+    amdgpu_fence_driver_clear_job_fences(ring);
+
+    /* Since the job won't signal and we go for
+ * another resubmit drop this parent pointer
+ */
+    dma_fence_put(s_job->s_fence->parent);
+    s_job->s_fence->parent = NULL;
+
  /* set guilty */
  drm_sched_increase_karma(s_job);
  retry:
@@ -5047,7 +5059,6 @@ static void amdgpu_device_recheck_guilty_jobs(
    /* got the hw fence, signal finished fence */
  atomic_dec(ring->sched.score);
-    dma_fence_put(s_job->s_fence->parent);
  dma_fence_get(_job->s_fence->finished);
  dma_fence_signal(_job->s_fence->finished);
  dma_fence_put(_job->s_fence->finished);
@@ -5220,8 +5231,8 @@ int amdgpu_device_gpu_recover(struct 
amdgpu_device *adev,

   *
   * job->base holds a reference to parent fence
   */
-    if (job && job->base.s_fence->parent &&
-    dma_fence_is_signaled(job->base.s_fence->parent)) {
+    if (job && (job->hw_fence.ops != NULL) &&
+    dma_fence_is_signaled(>hw_fence)) {
  job_signaled = true;
  dev_info(adev->dev, "Guilty job already signaled, skipping 
HW reset");

  goto skip_hw_reset;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index d6d54ba4c185..9bd4e18212fc 100644
--- a/drivers/gpu/drm/a

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Andrey Grodzovsky



On 2022-06-21 03:25, Christian König wrote:

Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky:

Problem:
After we start handling timed out jobs we assume there fences won't be
signaled but we cannot be sure and sometimes they fire late. We need
to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and 
amdgpu_fence_process

from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.




Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 26 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
  3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 2b92281dd0c1..c99541685804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

  amdgpu_virt_fini_data_exchange(adev);
  }
  +    amdgpu_fence_driver_isr_toggle(adev, true);
+
  /* block all schedulers and reset given job's ring */
  for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
  struct amdgpu_ring *ring = adev->rings[i];
@@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

  amdgpu_fence_driver_force_completion(ring);
  }
  +    amdgpu_fence_driver_isr_toggle(adev, false);
+
  if (job && job->vm)
  drm_sched_increase_karma(>base);
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index a9ae3beaa1d3..d6d54ba4c185 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct 
amdgpu_device *adev)

  }
  }
  +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, 
bool stop)

+{
+    int i;
+
+    for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+    struct amdgpu_ring *ring = adev->rings[i];
+
+    if (!ring || !ring->fence_drv.initialized || 
!ring->fence_drv.irq_src)

+    continue;
+
+    if (stop)
+    amdgpu_irq_put(adev, ring->fence_drv.irq_src,
+   ring->fence_drv.irq_type);
+    else
+    amdgpu_irq_get(adev, ring->fence_drv.irq_src,
+    ring->fence_drv.irq_type);


That won't work like this. This increments/decrements the reference 
count for the IRQ, but doesn't guarantee in any way that they are 
stopped/started.



I understand that, i just assumed that the fence driver is the only 
holder of this interrupt source (e.g. regCP_INT_CNTL_RING0) ?
I can disable amdgpu interrupt line totally using disable_irq - would 
this be better ?







+    }
+
+    /* TODO Only waits for irq handlers on other CPUs, maybe 
local_irq_save
+ * local_irq_local_irq_restore are needed here for local 
interrupts ?

+ *
+ */


Well that comment made me smile. Think for a moment what the local CPU 
would be doing if an interrupt would run :)



No, I understand this of course, I am ok to be interrupted by interrupt 
handler at this point, what i am trying to do
is to prevent amdgpu_fence_process to run concurrently with 
amdgpu_fence_driver_clear_job_fences - that is what this
function is trying to prevent - i disable and flush pending EOP ISR 
handlers before the call to clear fences and re-enable after.
I guess we can also introduce a spinlock to serialize them ? Yiqing 
reported seeing a race between them so we have to do something.


Andrey




Cheers,
Christian.


+    if (stop)
+    synchronize_irq(adev->irq.irq);
+}
+
  void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
  {
  unsigned int i, j;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 7d89a52091c0..82c178a9033a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct 
amdgpu_ring *ring,

    uint32_t wait_seq,
    signed long timeout);
  unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring);
+void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool 
stop);

    /*
   * Rings.




Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-21 Thread Andrey Grodzovsky

On 2022-06-21 03:19, Christian König wrote:


Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky:

Problem:
In amdgpu_job_submit_direct - The refcount should drop by 2
but it drops only by 1.

amdgpu_ib_sched->emit -> refcount 1 from first fence init
dma_fence_get -> refcount 2
dme_fence_put -> refcount 1

Fix:
Add put for external_hw_fence in amdgpu_job_free/free_cb


Well what is the external_hw_fence good for in this construct?



As far as I understand for direct submissions you don't want to pass a job
pointer to ib_schedule and so u can't use the embedded fence for this case.

Andrey






Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 10aa073600d4..58568fdde2d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct 
drm_sched_job *s_job)

  /* only put the hw fence if has embedded fence */
  if (job->hw_fence.ops != NULL)
  dma_fence_put(>hw_fence);
-    else
+    else {


When one side of the if uses {} the other side should use {} as well, 
e.g. use } else { here.


Christian.


+ dma_fence_put(job->external_hw_fence);
  kfree(job);
+    }
  }
    void amdgpu_job_free(struct amdgpu_job *job)
@@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job)
  /* only put the hw fence if has embedded fence */
  if (job->hw_fence.ops != NULL)
  dma_fence_put(>hw_fence);
-    else
+    else {
+    dma_fence_put(job->external_hw_fence);
  kfree(job);
+    }
  }
    int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,




[PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-20 Thread Andrey Grodzovsky
Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount balanced.

Also since in the previous patch we resumed setting s_fence->parent to NULL
in drm_sched_stop switch to directly checking if job->hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  4 
 4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 513c57f839d8..447bd92c4856 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
goto err_ib_sched;
}
 
+   /* Drop the initial kref_init count (see drm_sched_main as example) */
+   dma_fence_put(f);
ret = dma_fence_wait(f, false);
 
 err_ib_sched:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c99541685804..f9718119834f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs(
 
/* clear job's guilty and depend the folowing step to decide 
the real one */
drm_sched_reset_karma(s_job);
-   /* for the real bad job, it will be resubmitted twice, adding a 
dma_fence_get
-* to make sure fence is balanced */
-   dma_fence_get(s_job->s_fence->parent);
drm_sched_resubmit_jobs_ext(>sched, 1);
 
+   if (!s_job->s_fence->parent) {
+   DRM_WARN("Failed to get a HW fence for job!");
+   continue;
+   }
+
ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, 
ring->sched.timeout);
if (ret == 0) { /* timeout */
DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",
ring->sched.name, s_job->id);
 
+
+   /* Clear this failed job from fence array */
+   amdgpu_fence_driver_clear_job_fences(ring);
+
+   /* Since the job won't signal and we go for
+* another resubmit drop this parent pointer
+*/
+   dma_fence_put(s_job->s_fence->parent);
+   s_job->s_fence->parent = NULL;
+
/* set guilty */
drm_sched_increase_karma(s_job);
 retry:
@@ -5047,7 +5059,6 @@ static void amdgpu_device_recheck_guilty_jobs(
 
/* got the hw fence, signal finished fence */
atomic_dec(ring->sched.score);
-   dma_fence_put(s_job->s_fence->parent);
dma_fence_get(_job->s_fence->finished);
dma_fence_signal(_job->s_fence->finished);
dma_fence_put(_job->s_fence->finished);
@@ -5220,8 +5231,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 *
 * job->base holds a reference to parent fence
 */
-   if (job && job->base.s_fence->parent &&
-   dma_fence_is_signaled(job->base.s_fence->parent)) {
+   if (job && (job->hw_fence.ops != NULL) &&
+   dma_fence_is_signaled(>hw_fence)) {
job_signaled = true;
dev_info(adev->dev, "Guilty job already signaled, skipping HW 
reset");
goto skip_hw_reset;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d6d54ba4c185..9bd4e18212fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,11 +164,16 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct 
dma_fence **f, struct amd
if (job && job->job_run_counter) {
/* reinit seq for resubmitted jobs */
fence->seqno = seq;
+   /* TO be inline with external fence creation and other drivers 
*/
+   dma_fence_get(fence);
} else {
-   if (job)
+   if (job) {
dma_fence_init(fence, _job_fence_ops,
   >fence_drv.lock,
  

[PATCH 4/5] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'

2022-06-20 Thread Andrey Grodzovsky
Problem:
This patch caused negative refcount as described in [1] because
for that case parent fence did not signal by the time of drm_sched_stop and 
hence
kept in pending list the assumption was they will not signal and
so fence was put to account for the s_fence->parent refcount but for
amdgpu which has embedded HW fence (always same parent fence)
drm_sched_fence_release_scheduled was always called and would
still drop the count for parent fence once more. For jobs that
never signaled this imbalance was masked by refcount bug in
amdgpu_fence_driver_clear_job_fences that would not drop
refcount on the fences that were removed from fence drive
fences array (against prevois insertion into the array in
get in amdgpu_fence_emit).

Fix:
Revert this patch and by setting s_job->s_fence->parent to NULL
as before prevent the extra refcount drop in amdgpu when
drm_sched_fence_release_scheduled is called on job release.

Also - align behaviour in drm_sched_resubmit_jobs_ext with that of
drm_sched_main when submitting jobs - take a refcount for the
new parent fence pointer and drop refcount for original kref_init
for new HW fence creation (or fake new HW fence in amdgpu - see next patch).

[1] - 
https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3

Signed-off-by: Andrey Grodzovsky 
Tested-by: Yiqing Yao 
---
 drivers/gpu/drm/scheduler/sched_main.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b81fceb0b8a2..b38394f5694f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -419,6 +419,11 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
struct drm_sched_job *bad)
if (s_job->s_fence->parent &&
dma_fence_remove_callback(s_job->s_fence->parent,
  _job->cb)) {
+   /* Revert drm/sched: Keep s_fence->parent pointer, no
+* need anymore for amdgpu and creates only troubles
+*/
+   dma_fence_put(s_job->s_fence->parent);
+   s_job->s_fence->parent = NULL;
atomic_dec(>hw_rq_count);
} else {
/*
@@ -548,7 +553,6 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler 
*sched, int max)
if (found_guilty && s_job->s_fence->scheduled.context == 
guilty_context)
dma_fence_set_error(_fence->finished, -ECANCELED);
 
-   dma_fence_put(s_job->s_fence->parent);
fence = sched->ops->run_job(s_job);
i++;
 
@@ -558,7 +562,11 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler 
*sched, int max)
 
s_job->s_fence->parent = NULL;
} else {
-   s_job->s_fence->parent = fence;
+
+   s_job->s_fence->parent = dma_fence_get(fence);
+
+   /* Drop for orignal kref_init */
+   dma_fence_put(fence);
}
}
 }
@@ -952,6 +960,9 @@ static int drm_sched_main(void *param)
 
if (!IS_ERR_OR_NULL(fence)) {
s_fence->parent = dma_fence_get(fence);
+   /* Drop for original kref_init of the fence */
+   dma_fence_put(fence);
+
r = dma_fence_add_callback(fence, _job->cb,
   drm_sched_job_done_cb);
if (r == -ENOENT)
@@ -959,7 +970,6 @@ static int drm_sched_main(void *param)
else if (r)
DRM_DEV_ERROR(sched->dev, "fence add callback 
failed (%d)\n",
  r);
-   dma_fence_put(fence);
} else {
if (IS_ERR(fence))
dma_fence_set_error(_fence->finished, 
PTR_ERR(fence));
-- 
2.25.1



[PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-20 Thread Andrey Grodzovsky
Problem:
After we start handling timed out jobs we assume there fences won't be
signaled but we cannot be sure and sometimes they fire late. We need
to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process
from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 26 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
 3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b92281dd0c1..c99541685804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_virt_fini_data_exchange(adev);
}
 
+   amdgpu_fence_driver_isr_toggle(adev, true);
+
/* block all schedulers and reset given job's ring */
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_fence_driver_force_completion(ring);
}
 
+   amdgpu_fence_driver_isr_toggle(adev, false);
+
if (job && job->vm)
drm_sched_increase_karma(>base);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index a9ae3beaa1d3..d6d54ba4c185 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device 
*adev)
}
 }
 
+void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop)
+{
+   int i;
+
+   for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+   struct amdgpu_ring *ring = adev->rings[i];
+
+   if (!ring || !ring->fence_drv.initialized || 
!ring->fence_drv.irq_src)
+   continue;
+
+   if (stop)
+   amdgpu_irq_put(adev, ring->fence_drv.irq_src,
+  ring->fence_drv.irq_type);
+   else
+   amdgpu_irq_get(adev, ring->fence_drv.irq_src,
+   ring->fence_drv.irq_type);
+   }
+
+   /* TODO Only waits for irq handlers on other CPUs, maybe local_irq_save
+* local_irq_local_irq_restore are needed here for local interrupts ?
+*
+*/
+   if (stop)
+   synchronize_irq(adev->irq.irq);
+}
+
 void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
 {
unsigned int i, j;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7d89a52091c0..82c178a9033a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct amdgpu_ring 
*ring,
  uint32_t wait_seq,
  signed long timeout);
 unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring);
+void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop);
 
 /*
  * Rings.
-- 
2.25.1



[PATCH 2/5] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences

2022-06-20 Thread Andrey Grodzovsky
This function should drop the fence refcount when it extracts the
fence from the fence array, just as it's done in amdgpu_fence_process.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 957437a5558c..a9ae3beaa1d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -595,8 +595,10 @@ void amdgpu_fence_driver_clear_job_fences(struct 
amdgpu_ring *ring)
for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
ptr = >fence_drv.fences[i];
old = rcu_dereference_protected(*ptr, 1);
-   if (old && old->ops == _job_fence_ops)
+   if (old && old->ops == _job_fence_ops) {
RCU_INIT_POINTER(*ptr, NULL);
+   dma_fence_put(old);
+   }
}
 }
 
-- 
2.25.1



[PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-20 Thread Andrey Grodzovsky
Problem:
In amdgpu_job_submit_direct - The refcount should drop by 2
but it drops only by 1.

amdgpu_ib_sched->emit -> refcount 1 from first fence init
dma_fence_get -> refcount 2
dme_fence_put -> refcount 1

Fix:
Add put for external_hw_fence in amdgpu_job_free/free_cb

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 10aa073600d4..58568fdde2d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
 /* only put the hw fence if has embedded fence */
if (job->hw_fence.ops != NULL)
dma_fence_put(>hw_fence);
-   else
+   else {
+   dma_fence_put(job->external_hw_fence);
kfree(job);
+   }
 }
 
 void amdgpu_job_free(struct amdgpu_job *job)
@@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job)
/* only put the hw fence if has embedded fence */
if (job->hw_fence.ops != NULL)
dma_fence_put(>hw_fence);
-   else
+   else {
+   dma_fence_put(job->external_hw_fence);
kfree(job);
+   }
 }
 
 int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
-- 
2.25.1



[PATCH 0/5] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.

2022-06-20 Thread Andrey Grodzovsky
Yiqing raised a problem of negative fence refcount for resubmitted jobs
in amdgpu and suggested a workaround in [1]. I took  a look myself and 
discovered
some deeper problems both in amdgpu and scheduler code.

Yiqing helped with testing the new code and also drew a detailed refcount and 
flow
tracing diagram for parent (HW) fence life cycle and refcount under various
cases for the proposed patchset at [2].

[1] - 
https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3
[2] - 
https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing

Andrey Grodzovsky (5):
  drm/amdgpu: Fix possible refcount leak for release of
external_hw_fence
  drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
  drm/amdgpu: Prevent race between late signaled fences and GPU reset.
  drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'
  drm/amdgpu: Follow up change to previous drm scheduler change.

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 37 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 12 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
 drivers/gpu/drm/scheduler/sched_main.c | 16 --
 6 files changed, 78 insertions(+), 17 deletions(-)

-- 
2.25.1



Re: [PATCH] drm/amdgpu: fix refcount underflow in device reset

2022-06-06 Thread Andrey Grodzovsky



On 2022-06-06 03:43, Yiqing Yao wrote:

[why]
A gfx job may be processed but not finished when reset begin from
compute job timeout. drm_sched_resubmit_jobs_ext in sched_main
assume submitted job unsignaled and always put parent fence.
Resubmission for that job cause underflow. This fix is done in
device reset to avoid changing drm sched_main.



Are we talking about amdgpu_fence_process sneaking in here just before 
you call

drm_sched_resubmit_jobs_ext->dma_fence_put(parent) and doing extra put ?


How about first remove the fence in question from >fences like
it's done in




[how]
Check if the job to submit has signaled and avoid submission if
signaled in device reset for both advanced TDR and normal job
resume.



If what i said above is the problem then this is a racy solution no ?
The fence can signal right after your check anyway.

How about first removing this fence from  drv->fences array and only then
checking if it's signaled or not. If signaled you can just skip 
resubmission and
otherwise you can go ahead and call resubmit and not worry about double 
put.


Andrey




Signed-off-by: Yiqing Yao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 72 --
  1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f16f105a737b..29b307af97eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4980,39 +4980,43 @@ static void amdgpu_device_recheck_guilty_jobs(
/* for the real bad job, it will be resubmitted twice, adding a 
dma_fence_get
 * to make sure fence is balanced */
dma_fence_get(s_job->s_fence->parent);
-   drm_sched_resubmit_jobs_ext(>sched, 1);
  
-		ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout);

-   if (ret == 0) { /* timeout */
-   DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",
-   ring->sched.name, s_job->id);
+   /* avoid submission for signaled hw fence */
+   if(!dma_fence_is_signaled(s_job->s_fence->parent)){
  
-			/* set guilty */

-   drm_sched_increase_karma(s_job);
+   drm_sched_resubmit_jobs_ext(>sched, 1);
+
+   ret = dma_fence_wait_timeout(s_job->s_fence->parent, 
false, ring->sched.timeout);
+   if (ret == 0) { /* timeout */
+   DRM_ERROR("Found the real bad job! ring:%s, 
job_id:%llx\n",
+   ring->sched.name, 
s_job->id);
+
+   /* set guilty */
+   drm_sched_increase_karma(s_job);
  retry:
-   /* do hw reset */
-   if (amdgpu_sriov_vf(adev)) {
-   amdgpu_virt_fini_data_exchange(adev);
-   r = amdgpu_device_reset_sriov(adev, false);
-   if (r)
-   adev->asic_reset_res = r;
-   } else {
-   clear_bit(AMDGPU_SKIP_HW_RESET,
- _context->flags);
-   r = amdgpu_do_asic_reset(device_list_handle,
-reset_context);
-   if (r && r == -EAGAIN)
-   goto retry;
-   }
+   /* do hw reset */
+   if (amdgpu_sriov_vf(adev)) {
+   amdgpu_virt_fini_data_exchange(adev);
+   r = amdgpu_device_reset_sriov(adev, 
false);
+   if (r)
+   adev->asic_reset_res = r;
+   } else {
+   clear_bit(AMDGPU_SKIP_HW_RESET,
+   _context->flags);
+   r = 
amdgpu_do_asic_reset(device_list_handle,
+   reset_context);
+   if (r && r == -EAGAIN)
+   goto retry;
+   }
  
-			/*

-* add reset counter so that the following
-* resubmitted job could flush vmid
-*/
-   atomic_inc(>gpu_reset_counter);
-   continue;
+   /*
+   * add reset counter so that the following
+   * resubmitted job could flush vmid
+ 

Re: [PATCH v3 4/7] drm/amdgpu: Add work_struct for GPU reset from debugfs

2022-05-30 Thread Andrey Grodzovsky

+ Monk

On 2022-05-30 03:52, Christian König wrote:



Am 25.05.22 um 21:04 schrieb Andrey Grodzovsky:

We need to have a work_struct to cancel this reset if another
already in progress.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +--
  2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 76df583663c7..8165ee5b0457 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1048,6 +1048,8 @@ struct amdgpu_device {
  bool    scpm_enabled;
  uint32_t    scpm_status;
+
+    struct work_struct    reset_work;
  };
  static inline struct amdgpu_device *drm_to_adev(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index d16c8c1f72db..b0498ffcf7c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -39,6 +39,7 @@
  #include 
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
+#include "amdgpu_reset.h"
  /*
   * Fences
@@ -798,7 +799,10 @@ static int gpu_recover_get(void *data, u64 *val)
  return 0;
  }
-    *val = amdgpu_device_gpu_recover(adev, NULL);
+    if (amdgpu_reset_domain_schedule(adev->reset_domain, 
>reset_work))

+    flush_work(>reset_work);
+
+    *val = atomic_read(>reset_domain->reset_res);
  pm_runtime_mark_last_busy(dev->dev);
  pm_runtime_put_autosuspend(dev->dev);
@@ -810,6 +814,14 @@ DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
  DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, 
gpu_recover_get, NULL,

   "%lld\n");
+static void amdgpu_debugfs_reset_work(struct work_struct *work)
+{
+    struct amdgpu_device *adev = container_of(work, struct 
amdgpu_device,

+  reset_work);
+
+    amdgpu_device_gpu_recover_imp(adev, NULL);
+}
+
  #endif
  void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
@@ -821,9 +833,12 @@ void amdgpu_debugfs_fence_init(struct 
amdgpu_device *adev)

  debugfs_create_file("amdgpu_fence_info", 0444, root, adev,
  _debugfs_fence_info_fops);
-    if (!amdgpu_sriov_vf(adev))
+    if (!amdgpu_sriov_vf(adev)) {


I think we should drop the check for amdgpu_sriov_vf() here. It's a 
valid requirement to be able to trigger a GPU reset for a VF as well.


But not topic of this patch, feel free to add an Reviewed-by: Christian 
König .


Regards,
Christian.


Monk - any idea why we prevent from creation of debugfs GPU reset for VF ?

Andrey




+
+    INIT_WORK(>reset_work, amdgpu_debugfs_reset_work);
  debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev,
  _debugfs_gpu_recover_fops);
+    }
  #endif
  }




[PATCH v3 6/7] drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover

2022-05-25 Thread Andrey Grodzovsky
We removed the wrapper that was queueing the recover function
into reset domain queue who was using this name.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8165ee5b0457..664ed0a6deab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1244,7 +1244,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev);
 bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job* job);
-int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job *job);
 void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index a23abc0e86e7..513c57f839d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -129,7 +129,7 @@ static void amdgpu_amdkfd_reset_work(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
  kfd.reset_work);
 
-   amdgpu_device_gpu_recover_imp(adev, NULL);
+   amdgpu_device_gpu_recover(adev, NULL);
 }
 
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e3e2a5d17cc2..424571e46cf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5065,7 +5065,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  * Returns 0 for success or an error on failure.
  */
 
-int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job *job)
 {
struct list_head device_list, *device_list_handle =  NULL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index b0498ffcf7c3..957437a5558c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -819,7 +819,7 @@ static void amdgpu_debugfs_reset_work(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
  reset_work);
 
-   amdgpu_device_gpu_recover_imp(adev, NULL);
+   amdgpu_device_gpu_recover(adev, NULL);
 }
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index dfe7f2b8f0aa..10aa073600d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -64,7 +64,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
if (amdgpu_device_should_recover_gpu(ring->adev)) {
-   r = amdgpu_device_gpu_recover_imp(ring->adev, job);
+   r = amdgpu_device_gpu_recover(ring->adev, job);
if (r)
DRM_ERROR("GPU Recovery Failed: %d\n", r);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index a439c04223b5..bc0049308207 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1922,7 +1922,7 @@ static void amdgpu_ras_do_recovery(struct work_struct 
*work)
}
 
if (amdgpu_device_should_recover_gpu(ras->adev))
-   amdgpu_device_gpu_recover_imp(ras->adev, NULL);
+   amdgpu_device_gpu_recover(ras->adev, NULL);
atomic_set(>in_recovery, 0);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index b81acf59870c..7ec5b5cf4bb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -284,7 +284,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct 
*work)
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||
ade

[PATCH v3 7/7] drm/amdgpu: Stop any pending reset if another in progress.

2022-05-25 Thread Andrey Grodzovsky
We skip rest requests if another one is already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++
 1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 424571e46cf5..e1f7ee604ea4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5054,6 +5054,27 @@ static void amdgpu_device_recheck_guilty_jobs(
}
 }
 
+static inline void amdggpu_device_stop_pedning_resets(struct amdgpu_device* 
adev)
+{
+   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+
+#if defined(CONFIG_DEBUG_FS)
+   if (!amdgpu_sriov_vf(adev))
+   cancel_work(>reset_work);
+#endif
+
+   if (adev->kfd.dev)
+   cancel_work(>kfd.reset_work);
+
+   if (amdgpu_sriov_vf(adev))
+   cancel_work(>virt.flr_work);
+
+   if (con && adev->ras_enabled)
+   cancel_work(>recovery_work);
+
+}
+
+
 /**
  * amdgpu_device_gpu_recover - reset the asic and recover scheduler
  *
@@ -5209,6 +5230,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  r, adev_to_drm(tmp_adev)->unique);
tmp_adev->asic_reset_res = r;
}
+
+   /*
+* Drop all pending non scheduler resets. Scheduler resets
+* were already dropped during drm_sched_stop
+*/
+   amdggpu_device_stop_pedning_resets(tmp_adev);
}
 
tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));
-- 
2.25.1



[PATCH v3 5/7] drm/amdgpu: Add work_struct for GPU reset from kfd.

2022-05-25 Thread Andrey Grodzovsky
We need to have a work_struct to cancel this reset if another
already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 --
 3 files changed, 15 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 1f8161cd507f..a23abc0e86e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -33,6 +33,7 @@
 #include 
 #include "amdgpu_ras.h"
 #include "amdgpu_umc.h"
+#include "amdgpu_reset.h"
 
 /* Total memory size in system memory and all GPU VRAM. Used to
  * estimate worst case amount of memory to reserve for page tables
@@ -122,6 +123,15 @@ static void amdgpu_doorbell_get_kfd_info(struct 
amdgpu_device *adev,
}
 }
 
+
+static void amdgpu_amdkfd_reset_work(struct work_struct *work)
+{
+   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+ kfd.reset_work);
+
+   amdgpu_device_gpu_recover_imp(adev, NULL);
+}
+
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
int i;
@@ -180,6 +190,8 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 
adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev,
adev_to_drm(adev), 
_resources);
+
+   INIT_WORK(>kfd.reset_work, amdgpu_amdkfd_reset_work);
}
 }
 
@@ -247,7 +259,8 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev)
 void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev)
 {
if (amdgpu_device_should_recover_gpu(adev))
-   amdgpu_device_gpu_recover(adev, NULL);
+   amdgpu_reset_domain_schedule(adev->reset_domain,
+>kfd.reset_work);
 }
 
 int amdgpu_amdkfd_alloc_gtt_mem(struct amdgpu_device *adev, size_t size,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f8b9f27adcf5..e0709af5a326 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -96,6 +96,7 @@ struct amdgpu_kfd_dev {
struct kfd_dev *dev;
uint64_t vram_used;
bool init_complete;
+   struct work_struct reset_work;
 };
 
 enum kgd_engine_type {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bfdd8883089a..e3e2a5d17cc2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5312,37 +5312,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
return r;
 }
 
-struct amdgpu_recover_work_struct {
-   struct work_struct base;
-   struct amdgpu_device *adev;
-   struct amdgpu_job *job;
-   int ret;
-};
-
-static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
-{
-   struct amdgpu_recover_work_struct *recover_work = container_of(work, 
struct amdgpu_recover_work_struct, base);
-
-   amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
-}
-/*
- * Serialize gpu recover into reset domain single threaded wq
- */
-int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
-   struct amdgpu_job *job)
-{
-   struct amdgpu_recover_work_struct work = {.adev = adev, .job = job};
-
-   INIT_WORK(, amdgpu_device_queue_gpu_recover_work);
-
-   if (!amdgpu_reset_domain_schedule(adev->reset_domain, ))
-   return -EAGAIN;
-
-   flush_work();
-
-   return atomic_read(>reset_domain->reset_res);
-}
-
 /**
  * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
  *
-- 
2.25.1



[PATCH v3 4/7] drm/amdgpu: Add work_struct for GPU reset from debugfs

2022-05-25 Thread Andrey Grodzovsky
We need to have a work_struct to cancel this reset if another
already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 76df583663c7..8165ee5b0457 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1048,6 +1048,8 @@ struct amdgpu_device {
 
boolscpm_enabled;
uint32_tscpm_status;
+
+   struct work_struct  reset_work;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d16c8c1f72db..b0498ffcf7c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -39,6 +39,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_reset.h"
 
 /*
  * Fences
@@ -798,7 +799,10 @@ static int gpu_recover_get(void *data, u64 *val)
return 0;
}
 
-   *val = amdgpu_device_gpu_recover(adev, NULL);
+   if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work))
+   flush_work(>reset_work);
+
+   *val = atomic_read(>reset_domain->reset_res);
 
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
@@ -810,6 +814,14 @@ DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
 DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, 
NULL,
 "%lld\n");
 
+static void amdgpu_debugfs_reset_work(struct work_struct *work)
+{
+   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+ reset_work);
+
+   amdgpu_device_gpu_recover_imp(adev, NULL);
+}
+
 #endif
 
 void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
@@ -821,9 +833,12 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
debugfs_create_file("amdgpu_fence_info", 0444, root, adev,
_debugfs_fence_info_fops);
 
-   if (!amdgpu_sriov_vf(adev))
+   if (!amdgpu_sriov_vf(adev)) {
+
+   INIT_WORK(>reset_work, amdgpu_debugfs_reset_work);
debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev,
_debugfs_gpu_recover_fops);
+   }
 #endif
 }
 
-- 
2.25.1



[PATCH v3 2/7] drm/amdgpu: Cache result of last reset at reset domain level.

2022-05-25 Thread Andrey Grodzovsky
Will be read by executors of async reset like debugfs.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 4daa0e893965..bfdd8883089a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5307,6 +5307,8 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
 
if (r)
dev_info(adev->dev, "GPU reset end with ret = %d\n", r);
+
+   atomic_set(>reset_domain->reset_res, r);
return r;
 }
 
@@ -5321,7 +5323,7 @@ static void amdgpu_device_queue_gpu_recover_work(struct 
work_struct *work)
 {
struct amdgpu_recover_work_struct *recover_work = container_of(work, 
struct amdgpu_recover_work_struct, base);
 
-   recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, 
recover_work->job);
+   amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
 }
 /*
  * Serialize gpu recover into reset domain single threaded wq
@@ -5338,7 +5340,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
flush_work();
 
-   return work.ret;
+   return atomic_read(>reset_domain->reset_res);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index c80af0889773..32c86a0b145c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -132,6 +132,7 @@ struct amdgpu_reset_domain 
*amdgpu_reset_create_reset_domain(enum amdgpu_reset_d
}
 
atomic_set(_domain->in_gpu_reset, 0);
+   atomic_set(_domain->reset_res, 0);
init_rwsem(_domain->sem);
 
return reset_domain;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 1949dbe28a86..9e55a5d7a825 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
@@ -82,6 +82,7 @@ struct amdgpu_reset_domain {
enum amdgpu_reset_domain_type type;
struct rw_semaphore sem;
atomic_t in_gpu_reset;
+   atomic_t reset_res;
 };
 
 
-- 
2.25.1



[PATCH v3 3/7] drm/admgpu: Serialize RAS recovery work directly into reset domain queue.

2022-05-25 Thread Andrey Grodzovsky
Save the extra usless work schedule.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 31207f7eec02..a439c04223b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -35,6 +35,8 @@
 #include "amdgpu_xgmi.h"
 #include "ivsrcid/nbio/irqsrcs_nbif_7_4.h"
 #include "atom.h"
+#include "amdgpu_reset.h"
+
 #ifdef CONFIG_X86_MCE_AMD
 #include 
 
@@ -1920,7 +1922,7 @@ static void amdgpu_ras_do_recovery(struct work_struct 
*work)
}
 
if (amdgpu_device_should_recover_gpu(ras->adev))
-   amdgpu_device_gpu_recover(ras->adev, NULL);
+   amdgpu_device_gpu_recover_imp(ras->adev, NULL);
atomic_set(>in_recovery, 0);
 }
 
@@ -2928,7 +2930,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev)
struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
 
if (atomic_cmpxchg(>in_recovery, 0, 1) == 0)
-   schedule_work(>recovery_work);
+   amdgpu_reset_domain_schedule(ras->adev->reset_domain, 
>recovery_work);
return 0;
 }
 
-- 
2.25.1



[PATCH v3 1/7] Revert "workqueue: remove unused cancel_work()"

2022-05-25 Thread Andrey Grodzovsky
This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f.

amdpgu need this function in order to prematurly stop pending
reset works when another reset work already in progress.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Lai Jiangshan
Reviewed-by: Christian König 
---
 include/linux/workqueue.h | 1 +
 kernel/workqueue.c| 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e41e1226193 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func);
 int execute_in_process_context(work_func_t fn, struct execute_work *);
 
 extern bool flush_work(struct work_struct *work);
+extern bool cancel_work(struct work_struct *work);
 extern bool cancel_work_sync(struct work_struct *work);
 
 extern bool flush_delayed_work(struct delayed_work *dwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 613917bbc4e7..f94b596ebffd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool 
is_dwork)
return ret;
 }
 
+/*
+ * See cancel_delayed_work()
+ */
+bool cancel_work(struct work_struct *work)
+{
+   return __cancel_work(work, false);
+}
+EXPORT_SYMBOL(cancel_work);
+
 /**
  * cancel_delayed_work - cancel a delayed work
  * @dwork: delayed_work to cancel
-- 
2.25.1



[PATCH v3 0/7] Fix multiple GPU resets in XGMI hive.

2022-05-25 Thread Andrey Grodzovsky
Problem:
During hive reset caused by command timing out on a ring
extra resets are generated by triggered by KFD which is
unable to accesses registers on the resetting ASIC.

Fix: Rework GPU reset to actively stop any pending reset
works while another in progress. 

v2: Switch from generic list as was in v1[1] to eplicit 
stopping of each reset request from each reset source
per each request submitter. 

v3: Switch back to work_struct from delayed_work (Christian)

[1] - 
https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzov...@amd.com/

Andrey Grodzovsky (7):
  Revert "workqueue: remove unused cancel_work()"
  drm/amdgpu: Cache result of last reset at reset domain level.
  drm/admgpu: Serialize RAS recovery work directly into reset domain
queue.
  drm/amdgpu: Add work_struct for GPU reset from debugfs
  drm/amdgpu: Add work_struct for GPU reset from kfd.
  drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to
amdgpu_device_gpu_recover
  drm/amdgpu: Stop any pending reset if another in progress.

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 19 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  |  2 +-
 include/linux/workqueue.h  |  1 +
 kernel/workqueue.c |  9 
 14 files changed, 84 insertions(+), 41 deletions(-)

-- 
2.25.1



Re: [PATCH] Revert "workqueue: remove unused cancel_work()"

2022-05-20 Thread Andrey Grodzovsky



On 2022-05-20 03:52, Tejun Heo wrote:

On Fri, May 20, 2022 at 08:22:39AM +0200, Christian König wrote:

Am 20.05.22 um 02:47 schrieb Lai Jiangshan:

On Thu, May 19, 2022 at 11:04 PM Andrey Grodzovsky
 wrote:

See this patch-set 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg78514.htmldata=05%7C01%7Candrey.grodzovsky%40amd.com%7Cb25896b7e8b14e605a8d08da3a35a7c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637886299388464620%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=TRabWQQrhy6nwkLfuXI4A%2FOcF9f%2FtFKdxIRfGc8Das4%3Dreserved=0,
 specifically patch
'drm/amdgpu: Switch to delayed work from work_struct.

I will just reiterate here -

We need to be able to do non blocking cancel pending reset works
from within GPU reset. Currently kernel API allows this only
for delayed_work and not for work_struct.


I'm OK with the change.

With an updated changelog:

Reviewed-by: Lai Jiangshan


Good morning guys,

for the patch itself Reviewed-by: Christian König 

And just for the record: We plan to push this upstream through the drm
branches, if anybody has any objections to that please speak up.


Andrey, care to resend with updated description?

Thanks



Just adding here as attachment since only description changed changed.

AndreyFrom 78df30cc97f10c885f5159a293e6afe2348aa60c Mon Sep 17 00:00:00 2001
From: Andrey Grodzovsky 
Date: Thu, 19 May 2022 09:47:28 -0400
Subject: Revert "workqueue: remove unused cancel_work()"

This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f.

amdpgu need this function in order to prematurly stop pending
reset works when another reset work already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 include/linux/workqueue.h | 1 +
 kernel/workqueue.c| 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e41e1226193 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func);
 int execute_in_process_context(work_func_t fn, struct execute_work *);
 
 extern bool flush_work(struct work_struct *work);
+extern bool cancel_work(struct work_struct *work);
 extern bool cancel_work_sync(struct work_struct *work);
 
 extern bool flush_delayed_work(struct delayed_work *dwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 613917bbc4e7..f94b596ebffd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool is_dwork)
 	return ret;
 }
 
+/*
+ * See cancel_delayed_work()
+ */
+bool cancel_work(struct work_struct *work)
+{
+	return __cancel_work(work, false);
+}
+EXPORT_SYMBOL(cancel_work);
+
 /**
  * cancel_delayed_work - cancel a delayed work
  * @dwork: delayed_work to cancel
-- 
2.25.1



Re: [PATCH] Revert "workqueue: remove unused cancel_work()"

2022-05-19 Thread Andrey Grodzovsky
See this patch-set https://www.spinics.net/lists/amd-gfx/msg78514.html, 
specifically patch

'drm/amdgpu: Switch to delayed work from work_struct.

I will just reiterate here -

We need to be able to do non blocking cancel pending reset works
from within GPU reset. Currently kernel API allows this only
for delayed_work and not for work_struct.

Andrey

On 2022-05-19 10:52, Lai Jiangshan wrote:

On Thu, May 19, 2022 at 9:57 PM Andrey Grodzovsky
  wrote:

This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f
and exports the function.

We need this funtion in amdgpu driver to fix a bug.

Hello,

Could you specify the reason why it is needed in amdgpu driver
rather than "fix a bug", please.

And there is a typo: "funtion".

And please avoid using "we" in the changelog.  For example, the
sentence can be changed to:

The amdgpu driver needs this function to cancel a work item
in blabla context/situation or for blabla reason.
(I'm not good at Engish, this is just an example of not
using "we".  No need to use the sentence.)

Thanks
Lai


Signed-off-by: Andrey Grodzovsky
---
  include/linux/workqueue.h | 1 +
  kernel/workqueue.c| 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e41e1226193 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func);
  int execute_in_process_context(work_func_t fn, struct execute_work *);

  extern bool flush_work(struct work_struct *work);
+extern bool cancel_work(struct work_struct *work);
  extern bool cancel_work_sync(struct work_struct *work);

  extern bool flush_delayed_work(struct delayed_work *dwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 613917bbc4e7..f94b596ebffd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool 
is_dwork)
 return ret;
  }

+/*
+ * See cancel_delayed_work()
+ */
+bool cancel_work(struct work_struct *work)
+{
+   return __cancel_work(work, false);
+}
+EXPORT_SYMBOL(cancel_work);
+
  /**
   * cancel_delayed_work - cancel a delayed work
   * @dwork: delayed_work to cancel
--
2.25.1


[PATCH] Revert "workqueue: remove unused cancel_work()"

2022-05-19 Thread Andrey Grodzovsky
This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f
and exports the function. 

We need this funtion in amdgpu driver to fix a bug.

Signed-off-by: Andrey Grodzovsky 
---
 include/linux/workqueue.h | 1 +
 kernel/workqueue.c| 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e41e1226193 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func);
 int execute_in_process_context(work_func_t fn, struct execute_work *);
 
 extern bool flush_work(struct work_struct *work);
+extern bool cancel_work(struct work_struct *work);
 extern bool cancel_work_sync(struct work_struct *work);
 
 extern bool flush_delayed_work(struct delayed_work *dwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 613917bbc4e7..f94b596ebffd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool 
is_dwork)
return ret;
 }
 
+/*
+ * See cancel_delayed_work()
+ */
+bool cancel_work(struct work_struct *work)
+{
+   return __cancel_work(work, false);
+}
+EXPORT_SYMBOL(cancel_work);
+
 /**
  * cancel_delayed_work - cancel a delayed work
  * @dwork: delayed_work to cancel
-- 
2.25.1



Re: [PATCH v2 0/7] Fix multiple GPU resets in XGMI hive.

2022-05-19 Thread Andrey Grodzovsky


On 2022-05-19 03:58, Christian König wrote:



Am 18.05.22 um 16:24 schrieb Andrey Grodzovsky:



On 2022-05-18 02:07, Christian König wrote:

Am 17.05.22 um 21:20 schrieb Andrey Grodzovsky:

Problem:
During hive reset caused by command timing out on a ring
extra resets are generated by triggered by KFD which is
unable to accesses registers on the resetting ASIC.

Fix: Rework GPU reset to actively stop any pending reset
works while another in progress.

v2: Switch from generic list as was in v1[1] to eplicit
stopping of each reset request from each reset source
per each request submitter.


Looks mostly good to me.

Apart from the naming nit pick on patch #1 the only thing I couldn't 
of hand figure out is why you are using a delayed work everywhere 
instead of a just a work item.


That needs a bit further explanation what's happening here.

Christian.



Check APIs for cancelling work vs. delayed work -

For work_struct the only public API is this - 
https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3214 
- blocking cancel.


For delayed_work we have both blocking and non blocking public APIs -

https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295

https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295

I prefer not to go now into convincing core kernel people of exposing 
another interface for our own sake - from my past experience API 
changes in core code has slim chances and a lot of time spent on back 
and forth arguments.


"If the mountain will not come to Muhammad, then Muhammad must go to 
the mountain" ;)*

*



Ah, good point. The cancel_work() function was removed a few years ago:

commit 6417250d3f894e66a68ba1cd93676143f2376a6f
Author: Stephen Hemminger 
Date:   Tue Mar 6 19:34:42 2018 -0800

    workqueue: remove unused cancel_work()

    Found this by accident.
    There are no usages of bare cancel_work() in current kernel source.

    Signed-off-by: Stephen Hemminger 
    Signed-off-by: Tejun Heo 


Maybe just revert that patch, export the function and use it. I think 
there is plenty of justification for this.


Thanks,
Christian.



Ok - i will send them a patch - let's see what they say.

Andrey





**

Andrey





[1] - 
https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzov...@amd.com/


Andrey Grodzovsky (7):
   drm/amdgpu: Cache result of last reset at reset domain level.
   drm/amdgpu: Switch to delayed work from work_struct.
   drm/admgpu: Serialize RAS recovery work directly into reset domain
 queue.
   drm/amdgpu: Add delayed work for GPU reset from debugfs
   drm/amdgpu: Add delayed work for GPU reset from kfd.
   drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to
 amdgpu_device_gpu_recover
   drm/amdgpu: Stop any pending reset if another in progress.

  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62 
+++---

  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 19 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 10 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  2 +-
  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  6 +--
  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  6 +--
  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  |  6 +--
  14 files changed, 87 insertions(+), 54 deletions(-)





Re: [PATCH v2 0/7] Fix multiple GPU resets in XGMI hive.

2022-05-18 Thread Andrey Grodzovsky


On 2022-05-18 02:07, Christian König wrote:

Am 17.05.22 um 21:20 schrieb Andrey Grodzovsky:

Problem:
During hive reset caused by command timing out on a ring
extra resets are generated by triggered by KFD which is
unable to accesses registers on the resetting ASIC.

Fix: Rework GPU reset to actively stop any pending reset
works while another in progress.

v2: Switch from generic list as was in v1[1] to eplicit
stopping of each reset request from each reset source
per each request submitter.


Looks mostly good to me.

Apart from the naming nit pick on patch #1 the only thing I couldn't 
of hand figure out is why you are using a delayed work everywhere 
instead of a just a work item.


That needs a bit further explanation what's happening here.

Christian.



Check APIs for cancelling work vs. delayed work -

For work_struct the only public API is this - 
https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3214 
- blocking cancel.


For delayed_work we have both blocking and non blocking public APIs -

https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295

https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295

I prefer not to go now into convincing core kernel people of exposing 
another interface for our own sake - from my past experience API changes 
in core code has slim chances and a lot of time spent on back and forth 
arguments.


"If the mountain will not come to Muhammad, then Muhammad must go to the 
mountain" ;)*

*

Andrey





[1] - 
https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzov...@amd.com/


Andrey Grodzovsky (7):
   drm/amdgpu: Cache result of last reset at reset domain level.
   drm/amdgpu: Switch to delayed work from work_struct.
   drm/admgpu: Serialize RAS recovery work directly into reset domain
 queue.
   drm/amdgpu: Add delayed work for GPU reset from debugfs
   drm/amdgpu: Add delayed work for GPU reset from kfd.
   drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to
 amdgpu_device_gpu_recover
   drm/amdgpu: Stop any pending reset if another in progress.

  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 19 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 10 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  2 +-
  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  6 +--
  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  6 +--
  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  |  6 +--
  14 files changed, 87 insertions(+), 54 deletions(-)



[PATCH v2 6/7] drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover

2022-05-17 Thread Andrey Grodzovsky
We removed the wrapper that was queueing the recover function
into reset domain queue who was using this name.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4ef17c6d1a50..ee668f253c7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1244,7 +1244,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev);
 bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job* job);
-int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job *job);
 void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 4cc846341394..434053a9e027 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -129,7 +129,7 @@ static void amdgpu_amdkfd_reset_work(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
  kfd.reset_work.work);
 
-   amdgpu_device_gpu_recover_imp(adev, NULL);
+   amdgpu_device_gpu_recover(adev, NULL);
 }
 
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ae4c37c89ac7..65f738fd4761 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5065,7 +5065,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  * Returns 0 for success or an error on failure.
  */
 
-int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  struct amdgpu_job *job)
 {
struct list_head device_list, *device_list_handle =  NULL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index f980f1501c48..7954ebf16885 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -819,7 +819,7 @@ static void amdgpu_debugfs_reset_work(struct work_struct 
*work)
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
  reset_work.work);
 
-   amdgpu_device_gpu_recover_imp(adev, NULL);
+   amdgpu_device_gpu_recover(adev, NULL);
 }
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index dfe7f2b8f0aa..10aa073600d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -64,7 +64,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
if (amdgpu_device_should_recover_gpu(ring->adev)) {
-   r = amdgpu_device_gpu_recover_imp(ring->adev, job);
+   r = amdgpu_device_gpu_recover(ring->adev, job);
if (r)
DRM_ERROR("GPU Recovery Failed: %d\n", r);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7e8c7bcc7303..221d24feb8c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1918,7 +1918,7 @@ static void amdgpu_ras_do_recovery(struct work_struct 
*work)
}
 
if (amdgpu_device_should_recover_gpu(ras->adev))
-   amdgpu_device_gpu_recover_imp(ras->adev, NULL);
+   amdgpu_device_gpu_recover(ras->adev, NULL);
atomic_set(>in_recovery, 0);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index aa5f6d6ea1e3..3b7d9f171793 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -284,7 +284,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct 
*work)
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||
ade

[PATCH v2 7/7] drm/amdgpu: Stop any pending reset if another in progress.

2022-05-17 Thread Andrey Grodzovsky
We skip rest requests if another one is already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++
 1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 65f738fd4761..43af5ea3eee5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5054,6 +5054,27 @@ static void amdgpu_device_recheck_guilty_jobs(
}
 }
 
+static inline void amdggpu_device_stop_pedning_resets(struct amdgpu_device* 
adev)
+{
+   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+
+#if defined(CONFIG_DEBUG_FS)
+   if (!amdgpu_sriov_vf(adev))
+   cancel_delayed_work(>reset_work);
+#endif
+
+   if (adev->kfd.dev)
+   cancel_delayed_work(>kfd.reset_work);
+
+   if (amdgpu_sriov_vf(adev))
+   cancel_delayed_work(>virt.flr_work);
+
+   if (con && adev->ras_enabled)
+   cancel_delayed_work(>recovery_work);
+
+}
+
+
 /**
  * amdgpu_device_gpu_recover - reset the asic and recover scheduler
  *
@@ -5209,6 +5230,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  r, adev_to_drm(tmp_adev)->unique);
tmp_adev->asic_reset_res = r;
}
+
+   /*
+* Drop all pending non scheduler resets. Scheduler resets
+* were already dropped during drm_sched_stop
+*/
+   amdggpu_device_stop_pedning_resets(tmp_adev);
}
 
tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));
-- 
2.25.1



[PATCH v2 5/7] drm/amdgpu: Add delayed work for GPU reset from kfd.

2022-05-17 Thread Andrey Grodzovsky
We need to have a delayed work to cancel this reset if another
already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 --
 3 files changed, 15 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 1f8161cd507f..4cc846341394 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -33,6 +33,7 @@
 #include 
 #include "amdgpu_ras.h"
 #include "amdgpu_umc.h"
+#include "amdgpu_reset.h"
 
 /* Total memory size in system memory and all GPU VRAM. Used to
  * estimate worst case amount of memory to reserve for page tables
@@ -122,6 +123,15 @@ static void amdgpu_doorbell_get_kfd_info(struct 
amdgpu_device *adev,
}
 }
 
+
+static void amdgpu_amdkfd_reset_work(struct work_struct *work)
+{
+   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+ kfd.reset_work.work);
+
+   amdgpu_device_gpu_recover_imp(adev, NULL);
+}
+
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
int i;
@@ -180,6 +190,8 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 
adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev,
adev_to_drm(adev), 
_resources);
+
+   INIT_DELAYED_WORK(>kfd.reset_work, 
amdgpu_amdkfd_reset_work);
}
 }
 
@@ -247,7 +259,8 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev)
 void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev)
 {
if (amdgpu_device_should_recover_gpu(adev))
-   amdgpu_device_gpu_recover(adev, NULL);
+   amdgpu_reset_domain_schedule(adev->reset_domain,
+>kfd.reset_work);
 }
 
 int amdgpu_amdkfd_alloc_gtt_mem(struct amdgpu_device *adev, size_t size,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f8b9f27adcf5..5e04dba8c7f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -96,6 +96,7 @@ struct amdgpu_kfd_dev {
struct kfd_dev *dev;
uint64_t vram_used;
bool init_complete;
+   struct delayed_work reset_work;
 };
 
 enum kgd_engine_type {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ea41edf52a6f..ae4c37c89ac7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5308,37 +5308,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device 
*adev,
return r;
 }
 
-struct amdgpu_recover_work_struct {
-   struct delayed_work base;
-   struct amdgpu_device *adev;
-   struct amdgpu_job *job;
-   int ret;
-};
-
-static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
-{
-   struct amdgpu_recover_work_struct *recover_work = container_of(work, 
struct amdgpu_recover_work_struct, base.work);
-
-   amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
-}
-/*
- * Serialize gpu recover into reset domain single threaded wq
- */
-int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
-   struct amdgpu_job *job)
-{
-   struct amdgpu_recover_work_struct work = {.adev = adev, .job = job};
-
-   INIT_DELAYED_WORK(, amdgpu_device_queue_gpu_recover_work);
-
-   if (!amdgpu_reset_domain_schedule(adev->reset_domain, ))
-   return -EAGAIN;
-
-   flush_delayed_work();
-
-   return atomic_read(>reset_domain->reset_res);
-}
-
 /**
  * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
  *
-- 
2.25.1



[PATCH v2 4/7] drm/amdgpu: Add delayed work for GPU reset from debugfs

2022-05-17 Thread Andrey Grodzovsky
We need to have a delayed work to cancel this reset if another
already in progress.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3c20c2eadf4e..4ef17c6d1a50 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1048,6 +1048,8 @@ struct amdgpu_device {
 
boolscpm_enabled;
uint32_tscpm_status;
+
+   struct delayed_work reset_work;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d16c8c1f72db..f980f1501c48 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -39,6 +39,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_reset.h"
 
 /*
  * Fences
@@ -798,7 +799,10 @@ static int gpu_recover_get(void *data, u64 *val)
return 0;
}
 
-   *val = amdgpu_device_gpu_recover(adev, NULL);
+   if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work))
+   flush_delayed_work(>reset_work);
+
+   *val = atomic_read(>reset_domain->reset_res);
 
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
@@ -810,6 +814,14 @@ DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
 DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, 
NULL,
 "%lld\n");
 
+static void amdgpu_debugfs_reset_work(struct work_struct *work)
+{
+   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+ reset_work.work);
+
+   amdgpu_device_gpu_recover_imp(adev, NULL);
+}
+
 #endif
 
 void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
@@ -821,9 +833,12 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev)
debugfs_create_file("amdgpu_fence_info", 0444, root, adev,
_debugfs_fence_info_fops);
 
-   if (!amdgpu_sriov_vf(adev))
+   if (!amdgpu_sriov_vf(adev)) {
+
+   INIT_DELAYED_WORK(>reset_work, amdgpu_debugfs_reset_work);
debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev,
_debugfs_gpu_recover_fops);
+   }
 #endif
 }
 
-- 
2.25.1



  1   2   3   4   5   6   7   8   9   10   >