Re: [PATCH v8] drm/sched: Add FIFO sched policy to run queue
Thanks for helping with review and good improvement ideas. Pushed to drm-misc-next. Andrey On 2022-09-30 00:12, Luben Tuikov wrote: From: Andrey Grodzovsky When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben) v6: Add missing drm_sched_rq_remove_fifo_locked v7: Fix ts sampling bug and more cosmetic stuff (Luben) v8: Fix module parameter string (Luben) Cc: Luben Tuikov Cc: Christian König Cc: Direct Rendering Infrastructure - Development Cc: AMD Graphics Signed-off-by: Andrey Grodzovsky Tested-by: Yunxiang Li (Teddy) Signed-off-by: Luben Tuikov Reviewed-by: Luben Tuikov --- drivers/gpu/drm/scheduler/sched_entity.c | 20 + drivers/gpu/drm/scheduler/sched_main.c | 96 +++- include/drm/gpu_scheduler.h | 32 3 files changed, 145 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a308..7060e4ed5a3148 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -443,6 +444,19 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* Update the entity's location in the min heap according to +* the timestamp of the next job, if any. +*/ + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) { + struct drm_sched_job *next; + + next = to_drm_sched_job(spsc_queue_peek(>job_queue)); + if (next) + drm_sched_rq_update_fifo(entity, next->submit_ts); + } + return sched_job; } @@ -507,6 +521,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, sched_job->submit_ts); + drm_sched_wakeup(entity->rq->sched); } } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 4f2395d1a79182..ce86b03e838699 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -62,6 +62,55 @@ #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) +int drm_sched_policy = DRM_SCHED_POLICY_RR; + +/** + * DOC: sched_policy (int) + * Used to override default entities scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, "Specify schedule policy for entities on a runqueue, " __stringify(DRM_SC
[PATCH v5] drm/sched: Add FIFO sched policy to run queue
When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 ++- drivers/gpu/drm/scheduler/sched_main.c | 99 +++- include/drm/gpu_scheduler.h | 32 3 files changed, 151 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, ts); + drm_sched_wakeup(entity->rq->sched); } } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 4f2395d1a791..5349fc049384 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -62,6 +62,58 @@
Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3
Hey, i have problems with my git-send today so i just attached V5 as a patch here. Andrey On 2022-09-27 19:56, Luben Tuikov wrote: Inlined: On 2022-09-22 12:15, Andrey Grodzovsky wrote: On 2022-09-22 11:03, Luben Tuikov wrote: The title of this patch has "v3", but "v4" in the title prefix. If you're using "-v" to git-format-patch, please remove the "v3" from the title. Inlined: On 2022-09-21 14:28, Andrey Grodzovsky wrote: When many entities competing for same run queue on the same scheduler When many entities have unacceptably long wait time for some jobs waiting stuck in the run queue before being picked up are observed (seen using GPUVis). Use this as your opening: "When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis." The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stack for very long time in it's entity's "stuck", not "stack". queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 +- drivers/gpu/drm/scheduler/sched_main.c | 107 ++- include/drm/gpu_scheduler.h | 32 +++ 3 files changed, 159 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@
Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3
Ping Andrey On 2022-09-22 12:15, Andrey Grodzovsky wrote: On 2022-09-22 11:03, Luben Tuikov wrote: The title of this patch has "v3", but "v4" in the title prefix. If you're using "-v" to git-format-patch, please remove the "v3" from the title. Inlined: On 2022-09-21 14:28, Andrey Grodzovsky wrote: When many entities competing for same run queue on the same scheduler When many entities have unacceptably long wait time for some jobs waiting stuck in the run queue before being picked up are observed (seen using GPUVis). Use this as your opening: "When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis." The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stack for very long time in it's entity's "stuck", not "stack". queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 +- drivers/gpu/drm/scheduler/sched_main.c | 107 ++- include/drm/gpu_scheduler.h | 32 +++ 3 files changed, 159 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* + * It's when head job is extracted we can access the next job (or empty) + * queue and update the entity location in the min heap accordingly. + */ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, + (sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sch
Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3
On 2022-09-22 11:03, Luben Tuikov wrote: The title of this patch has "v3", but "v4" in the title prefix. If you're using "-v" to git-format-patch, please remove the "v3" from the title. Inlined: On 2022-09-21 14:28, Andrey Grodzovsky wrote: When many entities competing for same run queue on the same scheduler When many entities have unacceptably long wait time for some jobs waiting stuck in the run queue before being picked up are observed (seen using GPUVis). Use this as your opening: "When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis." The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stack for very long time in it's entity's "stuck", not "stack". queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 +- drivers/gpu/drm/scheduler/sched_main.c | 107 ++- include/drm/gpu_scheduler.h | 32 +++ 3 files changed, 159 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + d
[PATCH v4] drm/sched: Add FIFO sched policy to run queue v3
When many entities competing for same run queue on the same scheduler When many entities have unacceptably long wait time for some jobs waiting stuck in the run queue before being picked up are observed (seen using GPUVis). The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 +- drivers/gpu/drm/scheduler/sched_main.c | 107 ++- include/drm/gpu_scheduler.h | 32 +++ 3 files changed, 159 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, ts); + drm_sched_wakeup(entity->rq->sched); } } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 4f2395d1a791..565707a1c5c7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -62,6 +62,64 @@
Re: [PATCH v3] drm/sched: Add FIFO sched policy to run queue v3
On 2022-09-19 23:11, Luben Tuikov wrote: Please run this patch through checkpatch.pl, as it shows 12 warnings with it. Use these command line options: "--strict --show-types". Inlined: On 2022-09-13 16:40, Andrey Grodzovsky wrote: Given many entities competing for same run queue on the same scheduler and unacceptably long wait time for some jobs waiting stuck in the run queue before being picked up are observed (seen using GPUVis). Since the second part of this sentence is the result of the first, I'd say something like "When many entities ... we see unacceptably long ...". The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queus within entity some Spelling: "queues". jobs could be stack for very long time in it's entity's "stuck", not "stack". queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. "than". Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 - drivers/gpu/drm/scheduler/sched_main.c | 132 ++- include/drm/gpu_scheduler.h | 35 ++ 3 files changed, 187 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
I don't know if issue still exist but it worth checking with Christian who wrote this patch. Andrey On 2022-09-16 23:31, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Andrey, Yes, moving irq disable can fix the issue. Change in amdgpu_fence_process is just want to make sure driver can correct itself from an overflow situation. Didn’t know about the previous issue there. Do you know if the issue still exists? Or is it on VCE only? Thanks, Victor -Original Message- From: Grodzovsky, Andrey Sent: Friday, September 16, 2022 9:50 PM To: Koenig, Christian ; Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow On 2022-09-16 01:18, Christian König wrote: Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky: On 2022-09-15 15:26, Christian König wrote: Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky: On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + Upstream will pretty much kill you for that. Re-setting a fence from a signaled to an unsignaled state is a massive no-go. Christian. Is it worse then doing fence->seqno = seq; ? This is already a huge hack , no ? No, it's as equally bad. I don't think we can do either. Christian. And all those ugly hack are there because we reuse a dma_fence (hw_fence embedded into the job) and correct me if I am wrong but I don't think dma_fence is ever supposed to be reused. So maybe like Victor suggested we should move close and flush irq before sched_stop - this in my opinion should solve the issue, but Victor - why then you still need the change in amdgpu_fence_process ? You will not have the overflow situation because by moving irq_disable before stop any job that signaled will be removed from the scheduler pending list anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce that bug. Andrey Andrey /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
On 2022-09-16 01:18, Christian König wrote: Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky: On 2022-09-15 15:26, Christian König wrote: Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky: On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + Upstream will pretty much kill you for that. Re-setting a fence from a signaled to an unsignaled state is a massive no-go. Christian. Is it worse then doing fence->seqno = seq; ? This is already a huge hack , no ? No, it's as equally bad. I don't think we can do either. Christian. And all those ugly hack are there because we reuse a dma_fence (hw_fence embedded into the job) and correct me if I am wrong but I don't think dma_fence is ever supposed to be reused. So maybe like Victor suggested we should move close and flush irq before sched_stop - this in my opinion should solve the issue, but Victor - why then you still need the change in amdgpu_fence_process ? You will not have the overflow situation because by moving irq_disable before stop any job that signaled will be removed from the scheduler pending list anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce that bug. Andrey Andrey /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The signal bit will not be cleared. At the resubmit stage after recovery, the job with hw fence signaled bit set will call job done directly instead go through fence process. This makes the hw_rq_count decrease but rcu fence pointer not cleared yet. Then overflow happens in the fence driver slots and some jobs may be skipped and leave the rcu pointer not cleared which makes an infinite wait for the slot on the next fence emitted. This infinite wait cause a job timeout on the emitting job. And driver will stuck at the
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
On 2022-09-15 15:26, Christian König wrote: Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky: On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + Upstream will pretty much kill you for that. Re-setting a fence from a signaled to an unsignaled state is a massive no-go. Christian. Is it worse then doing fence->seqno = seq; ? This is already a huge hack , no ? Andrey /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The signal bit will not be cleared. At the resubmit stage after recovery, the job with hw fence signaled bit set will call job done directly instead go through fence process. This makes the hw_rq_count decrease but rcu fence pointer not cleared yet. Then overflow happens in the fence driver slots and some jobs may be skipped and leave the rcu pointer not cleared which makes an infinite wait for the slot on the next fence emitted. This infinite wait cause a job timeout on the emitting job. And driver will stuck at the its sched stop step because kthread_park cannot be done. [how] 1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt before drm sched stop 2. handle all fences in fence process to aviod skip when overflow happens Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 +- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 943c9e750575..c0cfae52f12b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } - amdgpu_fen
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
Had a typo - see bellow On 2022-09-15 14:29, Andrey Grodzovsky wrote: On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags); + /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The signal bit will not be cleared. At the resubmit stage after recovery, the job with hw fence signaled bit set will call job done directly instead go through fence process. This makes the hw_rq_count decrease but rcu fence pointer not cleared yet. Then overflow happens in the fence driver slots and some jobs may be skipped and leave the rcu pointer not cleared which makes an infinite wait for the slot on the next fence emitted. This infinite wait cause a job timeout on the emitting job. And driver will stuck at the its sched stop step because kthread_park cannot be done. [how] 1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt before drm sched stop 2. handle all fences in fence process to aviod skip when overflow happens Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 +- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 943c9e750575..c0cfae52f12b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } - amdgpu_fence_driver_isr_toggle(adev, true); - /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 +5212,8 @@ int amdgpu_device_gpu_recover(str
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The signal bit will not be cleared. At the resubmit stage after recovery, the job with hw fence signaled bit set will call job done directly instead go through fence process. This makes the hw_rq_count decrease but rcu fence pointer not cleared yet. Then overflow happens in the fence driver slots and some jobs may be skipped and leave the rcu pointer not cleared which makes an infinite wait for the slot on the next fence emitted. This infinite wait cause a job timeout on the emitting job. And driver will stuck at the its sched stop step because kthread_park cannot be done. [how] 1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt before drm sched stop 2. handle all fences in fence process to aviod skip when overflow happens Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 +- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 943c9e750575..c0cfae52f12b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } - amdgpu_fence_driver_isr_toggle(adev, true); - /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 +5212,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, amdgpu_device_ip_need_full_reset(tmp_adev))
[PATCH v3] drm/sched: Add FIFO sched policy to run queue v3
Given many entities competing for same run queue on the same scheduler and unacceptably long wait time for some jobs waiting stuck in the run queue before being picked up are observed (seen using GPUVis). The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 - drivers/gpu/drm/scheduler/sched_main.c | 132 ++- include/drm/gpu_scheduler.h | 35 ++ 3 files changed, 187 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..f3ffce3c9304 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get())); + return sched_job; } @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) + drm_sched_rq_update_fifo(entity, ts); + drm_sched_wakeup(entity->rq->sched); } } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e5a4ecde0063..72f7105e0b16 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -62,6 +62,65 @@ #define to_drm_sched_job(sched_job)\ container_of((sched_job), struct drm_sched_job,
Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)
I guess, but this is kind of implicit assumption which is not really documented and easily overlooked. Anyway - for this code it's not directly relevant. Andrey On 2022-09-13 03:25, Christian König wrote: Am 13.09.22 um 04:00 schrieb Andrey Grodzovsky: [SNIP] You are right for scheduler mediated submissions (executing through drm_sched_backend_ops.run_job hook) , I am talking about direct submissions without gpu scheduler (using amdgpu_job_submit_direct) Andrey Direct submission is only used while initially testing the hardware, during a GPU reset/recovery or for handling page faults with the SDMA. In other words when we know that we have exclusive access to the hardware. Christian.
Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)
On 2022-09-12 21:44, Zhu, Jiadong wrote: [AMD Official Use Only - General] -Original Message- From: Grodzovsky, Andrey Sent: Tuesday, September 13, 2022 12:45 AM To: Christian König ; Zhu, Jiadong ; amd-gfx@lists.freedesktop.org Cc: Huang, Ray Subject: Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3) On 2022-09-12 12:22, Christian König wrote: Am 12.09.22 um 17:34 schrieb Andrey Grodzovsky: On 2022-09-12 09:27, Christian König wrote: Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky: On 2022-09-12 06:20, Christian König wrote: Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky: On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" The software ring is created to support priority context while there is only one hardware queue for gfx. Every software rings has its fence driver and could be used as an ordinary ring for the gpu_scheduler. Multiple software rings are binded to a real ring with the ring muxer. The packages committed on the software ring are copied to the real ring. v2: use array to store software ring entry. v3: remove unnecessary prints. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 67 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 204 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h | 48 + 7 files changed, 509 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 3e0e2eb7e235..85224bc81ce5 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o \ -amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o +amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o +\ +amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 53526ffb2ce1..0de8e3cd0f1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -33,6 +33,7 @@ #include "amdgpu_imu.h" #include "soc15.h" #include "amdgpu_ras.h" +#include "amdgpu_ring_mux.h" /* GFX current status */ #define AMDGPU_GFX_NORMAL_MODE 0xL @@ -346,6 +347,8 @@ struct amdgpu_gfx { struct amdgpu_gfx_ras*ras; boolis_poweron; + +struct amdgpu_ring_muxmuxer; }; #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..fe33a683bfba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -278,6 +278,9 @@ struct amdgpu_ring { boolis_mes_queue; uint32_thw_queue_id; struct amdgpu_mes_ctx_data *mes_ctx; + +boolis_sw_ring; + }; #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c new file mode 100644 index ..ea4a3c66119a --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -0,0 +1,182 @@ +/* + * Copyright 2022 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PA
Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)
On 2022-09-12 12:22, Christian König wrote: Am 12.09.22 um 17:34 schrieb Andrey Grodzovsky: On 2022-09-12 09:27, Christian König wrote: Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky: On 2022-09-12 06:20, Christian König wrote: Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky: On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" The software ring is created to support priority context while there is only one hardware queue for gfx. Every software rings has its fence driver and could be used as an ordinary ring for the gpu_scheduler. Multiple software rings are binded to a real ring with the ring muxer. The packages committed on the software ring are copied to the real ring. v2: use array to store software ring entry. v3: remove unnecessary prints. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 67 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 204 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h | 48 + 7 files changed, 509 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 3e0e2eb7e235..85224bc81ce5 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o \ - amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o + amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \ + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 53526ffb2ce1..0de8e3cd0f1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -33,6 +33,7 @@ #include "amdgpu_imu.h" #include "soc15.h" #include "amdgpu_ras.h" +#include "amdgpu_ring_mux.h" /* GFX current status */ #define AMDGPU_GFX_NORMAL_MODE 0xL @@ -346,6 +347,8 @@ struct amdgpu_gfx { struct amdgpu_gfx_ras *ras; bool is_poweron; + + struct amdgpu_ring_mux muxer; }; #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..fe33a683bfba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -278,6 +278,9 @@ struct amdgpu_ring { bool is_mes_queue; uint32_t hw_queue_id; struct amdgpu_mes_ctx_data *mes_ctx; + + bool is_sw_ring; + }; #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c new file mode 100644 index ..ea4a3c66119a --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -0,0 +1,182 @@ +/* + * Copyright 2022 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */
Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)
On 2022-09-12 09:27, Christian König wrote: Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky: On 2022-09-12 06:20, Christian König wrote: Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky: On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" The software ring is created to support priority context while there is only one hardware queue for gfx. Every software rings has its fence driver and could be used as an ordinary ring for the gpu_scheduler. Multiple software rings are binded to a real ring with the ring muxer. The packages committed on the software ring are copied to the real ring. v2: use array to store software ring entry. v3: remove unnecessary prints. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 67 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 204 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h | 48 + 7 files changed, 509 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 3e0e2eb7e235..85224bc81ce5 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o \ - amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o + amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \ + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 53526ffb2ce1..0de8e3cd0f1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -33,6 +33,7 @@ #include "amdgpu_imu.h" #include "soc15.h" #include "amdgpu_ras.h" +#include "amdgpu_ring_mux.h" /* GFX current status */ #define AMDGPU_GFX_NORMAL_MODE 0xL @@ -346,6 +347,8 @@ struct amdgpu_gfx { struct amdgpu_gfx_ras *ras; bool is_poweron; + + struct amdgpu_ring_mux muxer; }; #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..fe33a683bfba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -278,6 +278,9 @@ struct amdgpu_ring { bool is_mes_queue; uint32_t hw_queue_id; struct amdgpu_mes_ctx_data *mes_ctx; + + bool is_sw_ring; + }; #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c new file mode 100644 index ..ea4a3c66119a --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -0,0 +1,182 @@ +/* + * Copyright 2022 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include + +#include "amdgpu_ring_mux.h" +#include "amdgpu_ring.h&
Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)
On 2022-09-12 06:20, Christian König wrote: Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky: On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" The software ring is created to support priority context while there is only one hardware queue for gfx. Every software rings has its fence driver and could be used as an ordinary ring for the gpu_scheduler. Multiple software rings are binded to a real ring with the ring muxer. The packages committed on the software ring are copied to the real ring. v2: use array to store software ring entry. v3: remove unnecessary prints. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 67 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 204 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h | 48 + 7 files changed, 509 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 3e0e2eb7e235..85224bc81ce5 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o \ - amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o + amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \ + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 53526ffb2ce1..0de8e3cd0f1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -33,6 +33,7 @@ #include "amdgpu_imu.h" #include "soc15.h" #include "amdgpu_ras.h" +#include "amdgpu_ring_mux.h" /* GFX current status */ #define AMDGPU_GFX_NORMAL_MODE 0xL @@ -346,6 +347,8 @@ struct amdgpu_gfx { struct amdgpu_gfx_ras *ras; bool is_poweron; + + struct amdgpu_ring_mux muxer; }; #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..fe33a683bfba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -278,6 +278,9 @@ struct amdgpu_ring { bool is_mes_queue; uint32_t hw_queue_id; struct amdgpu_mes_ctx_data *mes_ctx; + + bool is_sw_ring; + }; #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c new file mode 100644 index ..ea4a3c66119a --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -0,0 +1,182 @@ +/* + * Copyright 2022 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include + +#include "amdgpu_ring_mux.h" +#include "amdgpu_ring.h" + +#define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ/2) + +static int copy_pkt_from
Re: [PATCH 4/4] drm/amdgpu: Implement OS triggered MCBP(v2)
On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" Trigger MCBP according to the priroty of the software rings and the hw fence signaling condition. The muxer records some lastest locations from the software ring which is used to resubmit packages in preemption scenarios. v2: update comment style Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c | 101 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h | 29 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 163 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 26 +++ 9 files changed, 351 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 85224bc81ce5..24c5aa19bbf2 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o \ amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \ - amdgpu_sw_ring.o amdgpu_ring_mux.o + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 258cffe3c06a..af86d87e2f3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, } } + amdgpu_ring_ib_begin(ring); if (job && ring->funcs->init_cond_exec) patch_offset = amdgpu_ring_init_cond_exec(ring); @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH) ring->funcs->emit_wave_limit(ring, false); + amdgpu_ring_ib_end(ring); amdgpu_ring_commit(ring); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c new file mode 100644 index ..2a12101a7699 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c @@ -0,0 +1,101 @@ +/* + * Copyright 2022 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include +#include +#include +#include +#include +#include + +#include "amdgpu.h" +#include "amdgpu_mcbp.h" +#include "amdgpu_ring.h" + +/* trigger mcbp and find if we need resubmit */ +int amdgpu_mcbp_trigger_preempt(struct amdgpu_ring_mux *mux) +{ + struct amdgpu_mux_entry *e; + struct amdgpu_ring *ring = NULL; + int i; + + DRM_INFO("%s in\n", __func__); + + spin_lock(>lock); Same comment/question about locking as in patch 1 + + amdgpu_ring_preempt_ib(mux->real_ring); + + ring = NULL; + for (i = 0; i < mux->num_ring_entries; i++) { + e = >ring_entries[i]; + if (e->ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT) { + ring = e->ring; + break; + } + } + + if (!ring) { + DRM_ERROR("cannot find low priority ring\n"); + return -ENOENT; + } + + amdgpu_fence_process(ring); What's the role of fence signaling here (sorry, I am not very knowledgeable about how exactly mcbp works) ? + + DRM_INFO("after preempted
Re: [PATCH 3/4] drm/amdgpu: Modify unmap_queue format for gfx9(v2)
Really can't say to much here as I am not really familiar with queues map/unmap... Andrey On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" 1. Modify the unmap_queue package on gfx9. Add trailing fence to track the preemption done. 2. Modify emit_ce_meta emit_de_meta functions for the resumed ibs. v2: restyle code not to use ternary operator. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 181 +++ drivers/gpu/drm/amd/amdgpu/soc15d.h | 2 + 3 files changed, 155 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index ba6d8c753f7e..d3155dc86c07 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -60,6 +60,7 @@ enum amdgpu_ring_priority_level { #define AMDGPU_FENCE_FLAG_64BIT (1 << 0) #define AMDGPU_FENCE_FLAG_INT (1 << 1) #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2) +#define AMDGPU_FENCE_FLAG_EXEC (1 << 3) #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring, sched) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 774e44e1074a..89a5c45b1006 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -753,7 +753,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device *adev); static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev, struct amdgpu_cu_info *cu_info); static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring); +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume); static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring); static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev, void *ras_error_status); @@ -826,9 +826,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring *kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index)); if (action == PREEMPT_QUEUES_NO_UNMAP) { - amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr)); - amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr)); - amdgpu_ring_write(kiq_ring, seq); + amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & ring->buf_mask)); + amdgpu_ring_write(kiq_ring, 0); + amdgpu_ring_write(kiq_ring, 0); + } else { amdgpu_ring_write(kiq_ring, 0); amdgpu_ring_write(kiq_ring, 0); @@ -5356,11 +5357,16 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, control |= ib->length_dw | (vmid << 24); - if (amdgpu_sriov_vf(ring->adev) && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) { + if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) { control |= INDIRECT_BUFFER_PRE_ENB(1); + if (flags & AMDGPU_IB_PREEMPTED) + control |= INDIRECT_BUFFER_PRE_RESUME(1); + if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid) - gfx_v9_0_ring_emit_de_meta(ring); + gfx_v9_0_ring_emit_de_meta(ring, +(!amdgpu_sriov_vf(ring->adev) && flags & AMDGPU_IB_PREEMPTED) ? + true : false); } amdgpu_ring_write(ring, header); @@ -5415,17 +5421,23 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr, bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT; bool int_sel = flags & AMDGPU_FENCE_FLAG_INT; bool writeback = flags & AMDGPU_FENCE_FLAG_TC_WB_ONLY; + bool exec = flags & AMDGPU_FENCE_FLAG_EXEC; + uint32_t dw2 = 0; /* RELEASE_MEM - flush caches, send int */ amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6)); - amdgpu_ring_write(ring, ((writeback ? (EOP_TC_WB_ACTION_EN | - EOP_TC_NC_ACTION_EN) : - (EOP_TCL1_ACTION_EN | - EOP_TC_ACTION_EN | - EOP_TC_WB_ACTION_EN | - EOP_TC_MD_ACTION_EN)) | -EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) | -EVENT_INDEX(5))); + + if (writeback) { + dw2 = EOP_TC_WB_ACTION_EN | EOP_TC_NC_ACTION_EN; + } else { + dw2 = EOP_TCL1_ACTION_EN | EOP_TC_ACTION_EN | + EOP_TC_WB_ACTION_EN | EOP_TC_MD_ACTION_EN; + } + dw2 |= EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5); +
Re: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9(v3)
Acked-by: Andrey Grodzovsky Andrey On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" Set ring functions with software ring callbacks on gfx9. The software ring could be tested by debugfs_test_ib case. v2: set sw_ring 2 to enable software ring by default. v3: remove the parameter for software ring enablement. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 116 +-- 5 files changed, 128 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 96d058c4cd4b..525df0b4d55f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -207,6 +207,7 @@ extern bool amdgpu_ignore_bad_page_threshold; extern struct amdgpu_watchdog_timer amdgpu_watchdog_timer; extern int amdgpu_async_gfx_ring; extern int amdgpu_mcbp; +extern int amdgpu_sw_ring; extern int amdgpu_discovery; extern int amdgpu_mes; extern int amdgpu_mes_kiq; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 0de8e3cd0f1c..5eec82014f0a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -348,6 +348,8 @@ struct amdgpu_gfx { boolis_poweron; + /*software ring*/ + unsigned num_sw_gfx_rings; struct amdgpu_ring_mux muxer; }; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 13db99d653bd..5b70a2c36d81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -33,6 +33,7 @@ #include #include "amdgpu.h" +#include "amdgpu_sw_ring.h" #include "atom.h" /* @@ -121,6 +122,11 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring) { uint32_t count; + if (ring->is_sw_ring) { + amdgpu_sw_ring_commit(ring); + return; + } + /* We pad to match fetch size */ count = ring->funcs->align_mask + 1 - (ring->wptr & ring->funcs->align_mask); @@ -183,6 +189,11 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, u32 *num_sched; u32 hw_ip; + if (adev->gfx.num_sw_gfx_rings > 0 && ring->is_sw_ring) { + return amdgpu_sw_ring_init(adev, ring, max_dw, irq_src, irq_type, + hw_prio, sched_score); + } + /* Set the hw submission limit higher for KIQ because * it's used for a number of gfx/compute tasks by both * KFD and KGD which may have outstanding fences and @@ -343,7 +354,10 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, */ void amdgpu_ring_fini(struct amdgpu_ring *ring) { - + if (ring->is_sw_ring) { + amdgpu_sw_ring_fini(ring); + return; + } /* Not to finish a ring which is not initialized */ if (!(ring->adev) || (!ring->is_mes_queue && !(ring->adev->rings[ring->idx]))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index fe33a683bfba..ba6d8c753f7e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -38,7 +38,8 @@ struct amdgpu_vm; /* max number of rings */ #define AMDGPU_MAX_RINGS 28 #define AMDGPU_MAX_HWIP_RINGS 8 -#define AMDGPU_MAX_GFX_RINGS 2 +/*2 software ring and 1 real ring*/ +#define AMDGPU_MAX_GFX_RINGS 3 #define AMDGPU_MAX_COMPUTE_RINGS 8 #define AMDGPU_MAX_VCE_RINGS 3 #define AMDGPU_MAX_UVD_ENC_RINGS 2 diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 5349ca4d19e3..774e44e1074a 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -47,6 +47,7 @@ #include "amdgpu_ras.h" +#include "amdgpu_sw_ring.h" #include "gfx_v9_4.h" #include "gfx_v9_0.h" #include "gfx_v9_4_2.h" @@ -55,7 +56,8 @@ #include "asic_reg/pwr/pwr_10_0_sh_mask.h" #include "asic_reg/gc/gc_9_0_default.h" -#define GFX9_NUM_GFX_RINGS 1 +#define GFX9_NUM_GFX_RINGS 3 +#define GFX9_NUM_SW_GFX_RINGS 2 #define GFX9_MEC_HPD_SIZE 4096 #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L #define RLC_SAVE_RESTORE_ADDR_STARTING_OFFSET 0xL @@ -2270,6 +2272,7 @@ static int gfx_v9_0_compute_ring_init(struct amdgpu_device *adev, int ring_id, static int gfx
Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)
On 2022-09-08 21:50, jiadong@amd.com wrote: From: "Jiadong.Zhu" The software ring is created to support priority context while there is only one hardware queue for gfx. Every software rings has its fence driver and could be used as an ordinary ring for the gpu_scheduler. Multiple software rings are binded to a real ring with the ring muxer. The packages committed on the software ring are copied to the real ring. v2: use array to store software ring entry. v3: remove unnecessary prints. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 67 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 204 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h | 48 + 7 files changed, 509 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 3e0e2eb7e235..85224bc81ce5 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o \ - amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o + amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \ + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 53526ffb2ce1..0de8e3cd0f1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -33,6 +33,7 @@ #include "amdgpu_imu.h" #include "soc15.h" #include "amdgpu_ras.h" +#include "amdgpu_ring_mux.h" /* GFX current status */ #define AMDGPU_GFX_NORMAL_MODE0xL @@ -346,6 +347,8 @@ struct amdgpu_gfx { struct amdgpu_gfx_ras *ras; boolis_poweron; + + struct amdgpu_ring_mux muxer; }; #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..fe33a683bfba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -278,6 +278,9 @@ struct amdgpu_ring { boolis_mes_queue; uint32_thw_queue_id; struct amdgpu_mes_ctx_data *mes_ctx; + + boolis_sw_ring; + }; #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c new file mode 100644 index ..ea4a3c66119a --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -0,0 +1,182 @@ +/* + * Copyright 2022 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include + +#include "amdgpu_ring_mux.h" +#include "amdgpu_ring.h" + +#define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ/2) + +static int copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring, + u64 s_begin, u64 s_end); + +int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct
Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best
Please send everything together because otherwise it's not clear why we need this. Andrey On 2022-09-08 11:09, James Zhu wrote: Yes, it is for NPI design. I will send out patches for review soon. Thanks! James On 2022-09-08 11:05 a.m., Andrey Grodzovsky wrote: So this is the real need of this patch-set, but this explanation doesn't appear anywhere in the description. It's always good to add a short 0 RFC patch which describes the intention of the patchset if the code is not self explanatory. And I still don't understand the need - i don't see anything in amdgpu_ctx_fini_entity regarding rings tracking ? Is it a new code you plan to add and not included in this patcheset ? Did i miss an earlier patch maybe ? Andrey On 2022-09-08 10:45, James Zhu wrote: To save lines is not the purpose. Also I want to use entity->sched_list to track ring which is used in this ctx in amdgpu_ctx_fini_entity Best Regards! James On 2022-09-08 10:38 a.m., Andrey Grodzovsky wrote: I guess it's an option but i don't really see what's the added value ? You saved a few lines in this patch but added a few lines in another. In total seems to me no to much difference ? Andrey On 2022-09-08 10:17, James Zhu wrote: Hi Andrey Basically this entire patch set are derived from patch [3/4]: entity->sched_list = num_sched_list > 1 ? sched_list : NULL; I think no special reason to treat single and multiple schedule list here. Best Regards! James On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote: What's the reason for this entire patch set ? Andrey On 2022-09-07 16:57, James Zhu wrote: drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of struct drm_gpu_scheduler * Signed-off-by: James Zhu --- include/drm/gpu_scheduler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0fca8f38bee4..011f70a43397 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct drm_sched_fence *fence); unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched); void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, unsigned long remaining); -struct drm_gpu_scheduler * +struct drm_gpu_scheduler ** drm_sched_pick_best(struct drm_gpu_scheduler **sched_list, unsigned int num_sched_list);
Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best
So this is the real need of this patch-set, but this explanation doesn't appear anywhere in the description. It's always good to add a short 0 RFC patch which describes the intention of the patchset if the code is not self explanatory. And I still don't understand the need - i don't see anything in amdgpu_ctx_fini_entity regarding rings tracking ? Is it a new code you plan to add and not included in this patcheset ? Did i miss an earlier patch maybe ? Andrey On 2022-09-08 10:45, James Zhu wrote: To save lines is not the purpose. Also I want to use entity->sched_list to track ring which is used in this ctx in amdgpu_ctx_fini_entity Best Regards! James On 2022-09-08 10:38 a.m., Andrey Grodzovsky wrote: I guess it's an option but i don't really see what's the added value ? You saved a few lines in this patch but added a few lines in another. In total seems to me no to much difference ? Andrey On 2022-09-08 10:17, James Zhu wrote: Hi Andrey Basically this entire patch set are derived from patch [3/4]: entity->sched_list = num_sched_list > 1 ? sched_list : NULL; I think no special reason to treat single and multiple schedule list here. Best Regards! James On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote: What's the reason for this entire patch set ? Andrey On 2022-09-07 16:57, James Zhu wrote: drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of struct drm_gpu_scheduler * Signed-off-by: James Zhu --- include/drm/gpu_scheduler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0fca8f38bee4..011f70a43397 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct drm_sched_fence *fence); unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched); void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, unsigned long remaining); -struct drm_gpu_scheduler * +struct drm_gpu_scheduler ** drm_sched_pick_best(struct drm_gpu_scheduler **sched_list, unsigned int num_sched_list);
Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best
I guess it's an option but i don't really see what's the added value ? You saved a few lines in this patch but added a few lines in another. In total seems to me no to much difference ? Andrey On 2022-09-08 10:17, James Zhu wrote: Hi Andrey Basically this entire patch set are derived from patch [3/4]: entity->sched_list = num_sched_list > 1 ? sched_list : NULL; I think no special reason to treat single and multiple schedule list here. Best Regards! James On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote: What's the reason for this entire patch set ? Andrey On 2022-09-07 16:57, James Zhu wrote: drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of struct drm_gpu_scheduler * Signed-off-by: James Zhu --- include/drm/gpu_scheduler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0fca8f38bee4..011f70a43397 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct drm_sched_fence *fence); unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched); void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, unsigned long remaining); -struct drm_gpu_scheduler * +struct drm_gpu_scheduler ** drm_sched_pick_best(struct drm_gpu_scheduler **sched_list, unsigned int num_sched_list);
Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best
What's the reason for this entire patch set ? Andrey On 2022-09-07 16:57, James Zhu wrote: drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of struct drm_gpu_scheduler * Signed-off-by: James Zhu --- include/drm/gpu_scheduler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0fca8f38bee4..011f70a43397 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -529,7 +529,7 @@ void drm_sched_fence_finished(struct drm_sched_fence *fence); unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched); void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, unsigned long remaining); -struct drm_gpu_scheduler * +struct drm_gpu_scheduler ** drm_sched_pick_best(struct drm_gpu_scheduler **sched_list, unsigned int num_sched_list);
Re: [PATCH v2] drm/sced: Add FIFO sched policy to rq
Luben, just a ping, whenever you have time. Andrey On 2022-09-05 01:57, Christian König wrote: Am 03.09.22 um 04:48 schrieb Andrey Grodzovsky: Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job might execute ealier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entites based on TS of oldest job waiting in job queue of enitity. Improves next enitity extraction to O(1). Enitity TS update O(log(number of entites in rq)) Drop default option in module control parameter. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) [SNIP] /** @@ -313,6 +330,14 @@ struct drm_sched_job { /** @last_dependency: tracks @dependencies as they signal */ unsigned long last_dependency; + + + /** + * @submit_ts: + * + * Marks job submit time Maybe write something like "When the job was pushed into the entity queue." Apart from that I leave it to Luben and you to get this stuff upstream. Thanks, Christian. + */ + ktime_t submit_ts; }; static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, @@ -501,6 +526,10 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); +void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts, + bool remove_only); + + int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, struct drm_gpu_scheduler **sched_list,
[PATCH v2] drm/sced: Add FIFO sched policy to rq
Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job might execute ealier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entites based on TS of oldest job waiting in job queue of enitity. Improves next enitity extraction to O(1). Enitity TS update O(log(number of entites in rq)) Drop default option in module control parameter. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 29 - drivers/gpu/drm/scheduler/sched_main.c | 131 ++- include/drm/gpu_scheduler.h | 29 + 3 files changed, 183 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 191c56064f19..65ae4be2248b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -33,6 +33,8 @@ #define to_drm_sched_job(sched_job)\ container_of((sched_job), struct drm_sched_job, queue_node) +extern int drm_sched_policy; + /** * drm_sched_entity_init - Init a context entity used by scheduler when * submit to HW ring. @@ -73,6 +75,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(>rb_tree_node); if(num_sched_list) entity->rq = _list[0]->sched_rq[entity->priority]; @@ -417,14 +420,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); if (!sched_job) - return NULL; + goto skip; while ((entity->dependency = drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); - if (drm_sched_entity_add_dependency_cb(entity)) - return NULL; + if (drm_sched_entity_add_dependency_cb(entity)) { + sched_job = NULL; + goto skip; + } } /* skip jobs from entity that marked guilty */ @@ -443,6 +448,17 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(>job_queue); + + /* +* It's when head job is extracted we can access the next job (or empty) +* queue and update the entity location in the min heap accordingly. +*/ +skip: + if (drm_sched_policy == 1) + drm_sched_rq_update_fifo(entity, +(sched_job ? sched_job->submit_ts : ktime_get()), +false); + return sched_job; } @@ -502,11 +518,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t ts = ktime_get(); trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ts; /* first job wakes up scheduler */ if (first) { @@ -518,8 +536,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); + + if (drm_sched_policy == 1) + drm_sched_rq_update_fifo(entity, ts, false); + drm_sched_wakeup(entity->rq->sched); } } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index c5437ee03e3f..4d2450b3f5bd 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drive
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-24 22:29, Luben Tuikov wrote: Inlined: On 2022-08-24 12:21, Andrey Grodzovsky wrote: On 2022-08-23 17:37, Luben Tuikov wrote: On 2022-08-23 14:57, Andrey Grodzovsky wrote: On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait time for some ^unacceptably jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some ^queues jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job ^entities; smaller might execute ealier even though that job arrived later ^earlier then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. Christian is not against it, just against adding 'auto' here - like the default. Exactly what I said. Also, I still think an O(1) scheduling (picking next to run) should be what we strive for in such a FIFO patch implementation. A FIFO mechanism is by it's nature an O(1) mechanism for picking the next element. Regards, Luben The only solution i see for this now is keeping a global per rq jobs list parallel to SPCP queue per entity - we use this list when we switch to FIFO scheduling, we can even start building it ONLY when we switch to FIFO building it gradually as more jobs come. Do you have other solution in mind ? The idea is to "sort" on insertion, not on picking the next one to run. cont'd below: Andrey + + #define to_drm_sched_job(sched_job)\ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue t
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-24 22:29, Luben Tuikov wrote: Inlined: On 2022-08-24 12:21, Andrey Grodzovsky wrote: On 2022-08-23 17:37, Luben Tuikov wrote: On 2022-08-23 14:57, Andrey Grodzovsky wrote: On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait time for some ^unacceptably jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some ^queues jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job ^entities; smaller might execute ealier even though that job arrived later ^earlier then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. Christian is not against it, just against adding 'auto' here - like the default. Exactly what I said. Also, I still think an O(1) scheduling (picking next to run) should be what we strive for in such a FIFO patch implementation. A FIFO mechanism is by it's nature an O(1) mechanism for picking the next element. Regards, Luben The only solution i see for this now is keeping a global per rq jobs list parallel to SPCP queue per entity - we use this list when we switch to FIFO scheduling, we can even start building it ONLY when we switch to FIFO building it gradually as more jobs come. Do you have other solution in mind ? The idea is to "sort" on insertion, not on picking the next one to run. cont'd below: Andrey + + #define to_drm_sched_job(sched_job)\ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue t
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-23 17:37, Luben Tuikov wrote: On 2022-08-23 14:57, Andrey Grodzovsky wrote: On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait time for some ^unacceptably jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some ^queues jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job ^entities; smaller might execute ealier even though that job arrived later ^earlier then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. Christian is not against it, just against adding 'auto' here - like the default. Exactly what I said. Also, I still think an O(1) scheduling (picking next to run) should be what we strive for in such a FIFO patch implementation. A FIFO mechanism is by it's nature an O(1) mechanism for picking the next element. Regards, Luben The only solution i see for this now is keeping a global per rq jobs list parallel to SPCP queue per entity - we use this list when we switch to FIFO scheduling, we can even start building it ONLY when we switch to FIFO building it gradually as more jobs come. Do you have other solution in mind ? The idea is to "sort" on insertion, not on picking the next one to run. cont'd below: Andrey + + #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, based on FIFO order of jobs arrivals. + * + * Returns NULL if none found. + */ +s
Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled
On 2022-08-17 10:01, Andrey Grodzovsky wrote: On 2022-08-17 09:44, Alex Deucher wrote: On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas wrote: [AMD Official Use Only - General] Hi Alex: When removing an amdgpu device, it may be difficult to change the order of psp_hw_fini calls. 1. The drm_dev_unplug call is at the beginning in the amdgpu_pci_remove function, which makes the gpu device inaccessible for userspace operations. If the call to psp_hw_fini was moved before drm_dev_unplug, userspace could access the gpu device but the psp might be removing. It has unknown issues. +Andrey Grodzovsky We should fix the ordering in amdgpu_pci_remove() then I guess? There are lots of places where drm_dev_enter() is used to protect access to the hardware which could be similarly affected. Alex We probably can try to move drm_dev_unplug after amdgpu_driver_unload_kms. I don't remember now why drm_dev_unplug must be the first thing we do in amdgpu_pci_remove and what impact it will have but maybe give it a try. Also see if you can run libdrm hotplug test suite before and after the change. Andrey Thinking a bit more about this - i guess the main problem with this will be that in case of real hot unplug (which is hard to test unless you have a real GPU cage with extenal GPU) this move will cause trying to accesses HW registers or MMIO ranges from VRAM BOs when HW is missing when you try to shut down the HW in HW fini IP block specific callbacks , this in turn will return garbage for reads (or all 1s maybe) which is what we probably were trying to avoid by putting drm_dev_unplug as the first thing. So it's probably a bit problematic. Andrey 2. psp_hw_fini is called by the .hw_fini iterator in amdgpu_device_ip_fini_early, referring to the code starting from amdgpu_pci_remove to .hw_fini is called, there are many preparatory operations before calling .hw_fini, which makes it very difficult to change the order of psp_hw_fini or all block .hw_fini. So can we do a workaround in psp_cmd_submit_buf when removing amdgpu device? -Original Message- From: Alex Deucher Sent: Monday, August 15, 2022 10:22 PM To: Chai, Thomas Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Chen, Guchun ; Chai, Thomas Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai wrote: The psp_cmd_submit_buf function is called by psp_hw_fini to send TA unload messages to psp to terminate ras, asd and tmr. But when amdgpu is uninstalled, drm_dev_unplug is called earlier than psp_hw_fini in amdgpu_pci_remove, the calling order as follows: static void amdgpu_pci_remove(struct pci_dev *pdev) { drm_dev_unplug .. amdgpu_driver_unload_kms->amdgpu_device_fini_hw->... ->.hw_fini->psp_hw_fini->... ->psp_ta_unload->psp_cmd_submit_buf .. } The program will return when calling drm_dev_enter in psp_cmd_submit_buf. So the call to drm_dev_enter in psp_cmd_submit_buf should be removed, so that the TA unload messages can be sent to the psp when amdgpu is uninstalled. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index b067ce45d226..0578d8d094a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->no_hw_access) return 0; - if (!drm_dev_enter(adev_to_drm(psp->adev), )) - return 0; - This check is to prevent the hardware from being accessed if the card is removed. I think we need to fix the ordering elsewhere. Alex memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct psp_context *psp, } exit: - drm_dev_exit(idx); return ret; } -- 2.25.1
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-24 04:29, Michel Dänzer wrote: On 2022-08-22 22:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job might execute ealier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Instead of ordering based on when jobs are added, might it be possible to order them based on when they become ready to run? Otherwise it seems possible to e.g. submit a large number of inter-dependent jobs at once, and they would all run before any jobs from another queue get a chance. While any of them is not ready (i.e. still having unfulfilled dependency) this job will not be chosen to run (see drm_sched_entity_is_ready). In this scenario if an earlier job from entity E1 is not ready to run it will be skipped and a later job from entity E2 (which is ready) will be chosen to run so E1 job is not blocking E2 job. The moment E1 job does become ready it seems to me logical to let it run ASAP as it's by now it spent the most time of anyone waiting for execution, and I don't think it matters that part of this time was because it waited for dependency job to complete it's run. Andrey
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait time for some ^unacceptably jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some ^queues jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job ^entities; smaller might execute ealier even though that job arrived later ^earlier then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. Christian is not against it, just against adding 'auto' here - like the default. Exactly what I said. Also, I still think an O(1) scheduling (picking next to run) should be what we strive for in such a FIFO patch implementation. A FIFO mechanism is by it's nature an O(1) mechanism for picking the next element. Regards, Luben The only solution i see for this now is keeping a global per rq jobs list parallel to SPCP queue per entity - we use this list when we switch to FIFO scheduling, we can even start building it ONLY when we switch to FIFO building it gradually as more jobs come. Do you have other solution in mind ? Andrey + + #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, based on FIFO order of jobs arrivals. + * + * Returns NULL if none found. + */ +static struct drm_sched_entity * +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) +{ + struct drm_sched_entity *tmp, *entity = NULL; + ktime_t oldest_ts = KTIME_MAX; + stru
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait time for some ^unacceptably jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some ^queues jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job ^entities; smaller might execute ealier even though that job arrived later ^earlier then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. Christian is not against it, just against adding 'auto' here - like the default. + + #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, based on FIFO order of jobs arrivals. + * + * Returns NULL if none found. + */ +static struct drm_sched_entity * +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) +{ + struct drm_sched_entity *tmp, *entity = NULL; + ktime_t oldest_ts = KTIME_MAX; + struct drm_sched_job *sched_job; + + spin_lock(>lock); + + list_for_each_entry(tmp, >entities, list) { + + if (drm_sched_entity_is_ready(tmp)) { + sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); + + if (ktime_before(sched_job->submit_ts, oldest_ts)) { + oldest_ts = sched_job->submit_ts; + entity = tmp; + } + } + } Here I think we need an O(1) lookup of the next job to pick out to run. I see a number of optimizations, for instance keeping the current/oldest timestamp in the rq
Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
On 2022-08-23 08:15, Christian König wrote: Am 22.08.22 um 22:09 schrieb Andrey Grodzovsky: Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job might execute ealier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); Well we don't really have an autodetect at the moment, so I would drop that. +module_param_named(sched_policy, drm_sched_policy, int, 0444); + + #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, based on FIFO order of jobs arrivals. + * + * Returns NULL if none found. + */ +static struct drm_sched_entity * +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) +{ + struct drm_sched_entity *tmp, *entity = NULL; + ktime_t oldest_ts = KTIME_MAX; + struct drm_sched_job *sched_job; + + spin_lock(>lock); + + list_for_each_entry(tmp, >entities, list) { + + if (drm_sched_entity_is_ready(tmp)) { + sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); + + if (ktime_before(sched_job->submit_ts, oldest_ts)) { + oldest_ts = sched_job->submit_ts; + entity = tmp; + } + } + } + + if (entity) { + rq->current_entity = entity; + reinit_completion(>entity_idle); + } That should probably be a separate function or at least outside of this here. Apart from that totally straight forward implementation. Any idea how much extra overhead that is? Regards, Christian. Well, memory wise you have the extra long for each job struct for the time stamp, and then for each next job extraction you have to iterate the entire rq to find the next entity with oldest job so always linear in number of entitles. Today the worst case is al
[PATCH] drm/sced: Add FIFO policy for scheduler rq
Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job might execute ealier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(>job_queue, _job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); + + #define to_drm_sched_job(sched_job)\ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, based on FIFO order of jobs arrivals. + * + * Returns NULL if none found. + */ +static struct drm_sched_entity * +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) +{ + struct drm_sched_entity *tmp, *entity = NULL; + ktime_t oldest_ts = KTIME_MAX; + struct drm_sched_job *sched_job; + + spin_lock(>lock); + + list_for_each_entry(tmp, >entities, list) { + + if (drm_sched_entity_is_ready(tmp)) { + sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue)); + + if (ktime_before(sched_job->submit_ts, oldest_ts)) { + oldest_ts = sched_job->submit_ts; + entity = tmp; + } + } + } + + if (entity) { + rq->current_entity = entity; + reinit_completion(>entity_idle); + } + + spin_unlock(>lock); + return entity; +} + /** * drm_sched_job_done - complete a job * @s_job: pointer to the job which is done @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) /* Kernel run queue has higher priority than normal run queue*/ for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { - entity = drm_sched_rq_select_entity(>sched_rq[i]); + entity = drm_sched_policy != 1 ?
Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled
On 2022-08-17 09:44, Alex Deucher wrote: On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas wrote: [AMD Official Use Only - General] Hi Alex: When removing an amdgpu device, it may be difficult to change the order of psp_hw_fini calls. 1. The drm_dev_unplug call is at the beginning in the amdgpu_pci_remove function, which makes the gpu device inaccessible for userspace operations. If the call to psp_hw_fini was moved before drm_dev_unplug, userspace could access the gpu device but the psp might be removing. It has unknown issues. +Andrey Grodzovsky We should fix the ordering in amdgpu_pci_remove() then I guess? There are lots of places where drm_dev_enter() is used to protect access to the hardware which could be similarly affected. Alex We probably can try to move drm_dev_unplug after amdgpu_driver_unload_kms. I don't remember now why drm_dev_unplug must be the first thing we do in amdgpu_pci_remove and what impact it will have but maybe give it a try. Also see if you can run libdrm hotplug test suite before and after the change. Andrey 2. psp_hw_fini is called by the .hw_fini iterator in amdgpu_device_ip_fini_early, referring to the code starting from amdgpu_pci_remove to .hw_fini is called, there are many preparatory operations before calling .hw_fini, which makes it very difficult to change the order of psp_hw_fini or all block .hw_fini. So can we do a workaround in psp_cmd_submit_buf when removing amdgpu device? -Original Message- From: Alex Deucher Sent: Monday, August 15, 2022 10:22 PM To: Chai, Thomas Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Chen, Guchun ; Chai, Thomas Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai wrote: The psp_cmd_submit_buf function is called by psp_hw_fini to send TA unload messages to psp to terminate ras, asd and tmr. But when amdgpu is uninstalled, drm_dev_unplug is called earlier than psp_hw_fini in amdgpu_pci_remove, the calling order as follows: static void amdgpu_pci_remove(struct pci_dev *pdev) { drm_dev_unplug .. amdgpu_driver_unload_kms->amdgpu_device_fini_hw->... ->.hw_fini->psp_hw_fini->... ->psp_ta_unload->psp_cmd_submit_buf .. } The program will return when calling drm_dev_enter in psp_cmd_submit_buf. So the call to drm_dev_enter in psp_cmd_submit_buf should be removed, so that the TA unload messages can be sent to the psp when amdgpu is uninstalled. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index b067ce45d226..0578d8d094a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->no_hw_access) return 0; - if (!drm_dev_enter(adev_to_drm(psp->adev), )) - return 0; - This check is to prevent the hardware from being accessed if the card is removed. I think we need to fix the ordering elsewhere. Alex memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct psp_context *psp, } exit: - drm_dev_exit(idx); return ret; } -- 2.25.1
Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak
On 2022-08-12 14:38, Kim, Jonathan wrote: [Public] Hi Andrey, Here's the load/unload stack trace. This is a 2 GPU xGMI system. I put dbg_xgmi_hive_get/put refcount print post kobj get/put. It's stuck at 2 on unload. If it's an 8 GPU system, it's stuck at 8. e.g. of sysfs leak after driver unload: atitest@atitest:/sys/devices/pci:80/:80:02.0/:81:00.0/:82:00.0/:83:00.0$ ls xgmi_hive_info/ xgmi_hive_id Thanks, Jon I see the leak, but how is it related to amdgpu_reset_domain ? How you think that he causing this ? Andrey Driver load (get ref happens on both device add to hive and init per device): [ 61.975900] amdkcl: loading out-of-tree module taints kernel. [ 61.975973] amdkcl: module verification failed: signature and/or required key missing - tainting kernel [ 62.065546] amdkcl: Warning: fail to get symbol cancel_work, replace it with kcl stub [ 62.081920] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. [ 62.491119] [drm] amdgpu kernel modesetting enabled. [ 62.491122] [drm] amdgpu version: 5.18.2 [ 62.491124] [drm] OS DRM version: 5.15.0 [ 62.491337] amdgpu: CRAT table not found [ 62.491341] amdgpu: Virtual CRAT table created for CPU [ 62.491360] amdgpu: Topology: Add CPU node [ 62.603556] amdgpu: PeerDirect support was initialized successfully [ 62.603847] amdgpu :83:00.0: enabling device (0100 -> 0102) [ 62.603987] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 0x1002:0x0834 0x00). [ 62.604023] [drm] register mmio base: 0xFBD0 [ 62.604026] [drm] register mmio size: 524288 [ 62.604171] [drm] add ip block number 0 [ 62.604175] [drm] add ip block number 1 [ 62.604177] [drm] add ip block number 2 [ 62.604180] [drm] add ip block number 3 [ 62.604182] [drm] add ip block number 4 [ 62.604185] [drm] add ip block number 5 [ 62.604187] [drm] add ip block number 6 [ 62.604190] [drm] add ip block number 7 [ 62.604192] [drm] add ip block number 8 [ 62.604194] [drm] add ip block number 9 [ 62.641771] amdgpu :83:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 62.641777] amdgpu: ATOM BIOS: 113-D1630200-112 [ 62.713418] [drm] UVD(0) is enabled in VM mode [ 62.713423] [drm] UVD(1) is enabled in VM mode [ 62.713426] [drm] UVD(0) ENC is enabled in VM mode [ 62.713428] [drm] UVD(1) ENC is enabled in VM mode [ 62.713430] [drm] VCE enabled in VM mode [ 62.713433] amdgpu :83:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 62.713472] [drm] GPU posting now... [ 62.713993] amdgpu :83:00.0: amdgpu: MEM ECC is active. [ 62.713995] amdgpu :83:00.0: amdgpu: SRAM ECC is active. [ 62.714006] amdgpu :83:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[7fff] ras_mask[7fff] [ 62.714018] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit [ 62.714026] amdgpu :83:00.0: amdgpu: VRAM: 32752M 0x0080 - 0x0087FEFF (32752M used) [ 62.714029] amdgpu :83:00.0: amdgpu: GART: 512M 0x - 0x1FFF [ 62.714032] amdgpu :83:00.0: amdgpu: AGP: 267845632M 0x0090 - 0x [ 62.714043] [drm] Detected VRAM RAM=32752M, BAR=32768M [ 62.714044] [drm] RAM width 4096bits HBM [ 62.714050] debugfs: Directory 'ttm' with parent '/' already present! [ 62.714146] [drm] amdgpu: 32752M of VRAM memory ready [ 62.714149] [drm] amdgpu: 40203M of GTT memory ready. [ 62.714170] [drm] GART: num cpu pages 131072, num gpu pages 131072 [ 62.714266] [drm] PCIE GART of 512M enabled. [ 62.714267] [drm] PTB located at 0x0080 [ 62.731067] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist [ 62.731075] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist [ 62.731449] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu [ 62.743177] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19 [ 62.743244] [drm] PSP loading UVD firmware [ 62.744525] [drm] Found VCE firmware Version: 57.6 Binary ID: 4 [ 62.744689] [drm] PSP loading VCE firmware [ 62.896804] [drm] reserve 0x40 from 0x87fec0 for PSP TMR [ 62.979421] amdgpu :83:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available [ 62.979427] amdgpu :83:00.0: amdgpu: DTM: optional dtm ta ucode is not available [ 62.979430] amdgpu :83:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 62.979432] amdgpu :83:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 62.982386] [drm] Display Core initialized with v3.2.196! [ 62.984514] [drm] kiq ring mec 2 pipe 1 q 0 [ 63.026846] [drm] UVD and UVD ENC initialized successfully. [ 63.225760] [drm] VCE initialized successfully. [ 63.22] amdgpu: [dbg_xgmi_hive_get] ref_count 2 [ 63.28] CPU: 10 PID: 397 Comm: kworker/10:2 Tainted: G OE 5.15.0-46-generic #49~20.04.1-Ubuntu [
Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak
On 2022-08-11 11:34, Kim, Jonathan wrote: [Public] -Original Message- From: Kuehling, Felix Sent: August 11, 2022 11:19 AM To: amd-gfx@lists.freedesktop.org; Kim, Jonathan Subject: Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak Am 2022-08-11 um 09:42 schrieb Jonathan Kim: When an xgmi node is added to the hive, it takes another hive reference for its reset domain. This extra reference was not dropped on device removal from the hive so drop it. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 1b108d03e785..560bf1c98f08 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -731,6 +731,9 @@ int amdgpu_xgmi_remove_device(struct amdgpu_device *adev) mutex_unlock(>hive_lock); amdgpu_put_xgmi_hive(hive); + /* device is removed from the hive so remove its reset domain reference */ + if (adev->reset_domain && adev->reset_domain == hive- reset_domain) + amdgpu_put_xgmi_hive(hive); This is some messed up reference counting. If you need an extra reference from the reset_domain to the hive, that should be owned by the reset_domain and dropped when the reset_domain is destroyed. And it's only one reference for the reset_domain, not one reference per adev in the reset_domain. Cc'ing Andrey. What you're saying seems to make more sense to me, but what I got from an offline conversation with Andrey was that the reset domain reference per device was intentional. Maybe Andrey can comment here. What you're doing here looks like every adev that's in a reset_domain of its hive has two references to the hive. And if you're dropping the extra reference here, it still leaves the reset_domain with a dangling pointer to a hive that may no longer exist. So this extra reference is kind of pointless. reset_domain doesn't have any references to the hive, the hive has a reference to reset_domain Yes. Currently one reference is fetched from the device's lifetime on the hive and the other is from the per-device reset domain. Snippet from amdgpu_device_ip_init: /** * In case of XGMI grab extra reference for reset domain for this device */ if (adev->gmc.xgmi.num_physical_nodes > 1) { if (amdgpu_xgmi_add_device(adev) == 0) { <- [JK] reference is fetched here amdgpu_xgmi_add_device calls amdgpu_get_xgmi_hive and only on the first time amdgpu_get_xgmi_hive is called and hive is actually allocated and initialized will we proceed to creating the reset domain either from scratch (first creation of the hive) or by taking reference from adev (see [1]) [1] - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c#L394 struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev); <- [JK] then here again So here I don't see how an extra reference to reset_domain is taken if amdgpu_get_xgmi_hive returns early since the hive already created and exists in the global hive container ? Johantan - can u please show the exact flow how recount leak on reset_domain is happening ? Andrey if (!hive->reset_domain || !amdgpu_reset_get_reset_domain(hive->reset_domain)) { r = -ENOENT; goto init_failed; } /* Drop the early temporary reset domain we created for device */ amdgpu_reset_put_reset_domain(adev->reset_domain); adev->reset_domain = hive->reset_domain; } } One of these never gets dropped so a leak happens. So either the extra reference has to be dropped on device removal from the hive or from what you've mentioned, the reset_domain reference fetch should be fixed to grab at the hive/reset_domain level. Thanks, Jon Regards, Felix adev->hive = NULL; if (atomic_dec_return(>number_devices) == 0) {
Re: [PATCH v3 1/6] drm/amdgpu: add mode2 reset for sienna_cichlid
Series is Acked-by: Andrey Grodzovsky Andrey On 2022-08-01 00:07, Victor Zhao wrote: To meet the requirement for multi container usecase which needs a quicker reset and not causing VRAM lost, adding the Mode2 reset handler for sienna_cichlid. v2: move skip mode2 flag part separately v3: remove the use of asic_reset_res Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 7 + drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 296 ++ drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h | 32 ++ .../pm/swsmu/inc/pmfw_if/smu_v11_0_7_ppsmc.h | 4 +- drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 +- .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 54 7 files changed, 394 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index c7d0cd15b5ef..7030ac2d7d2c 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -75,7 +75,7 @@ amdgpu-y += \ vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o vega10_reg_init.o \ vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o mxgpu_nv.o \ nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o soc21.o \ - nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o + sienna_cichlid.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o # add DF block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index 32c86a0b145c..f778466bb9db 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -23,6 +23,7 @@ #include "amdgpu_reset.h" #include "aldebaran.h" +#include "sienna_cichlid.h" int amdgpu_reset_add_handler(struct amdgpu_reset_control *reset_ctl, struct amdgpu_reset_handler *handler) @@ -40,6 +41,9 @@ int amdgpu_reset_init(struct amdgpu_device *adev) case IP_VERSION(13, 0, 2): ret = aldebaran_reset_init(adev); break; + case IP_VERSION(11, 0, 7): + ret = sienna_cichlid_reset_init(adev); + break; default: break; } @@ -55,6 +59,9 @@ int amdgpu_reset_fini(struct amdgpu_device *adev) case IP_VERSION(13, 0, 2): ret = aldebaran_reset_fini(adev); break; + case IP_VERSION(11, 0, 7): + ret = sienna_cichlid_reset_fini(adev); + break; default: break; } diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c new file mode 100644 index ..b61a8ddec7ef --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c @@ -0,0 +1,296 @@ +/* + * Copyright 2021 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include "sienna_cichlid.h" +#include "amdgpu_reset.h" +#include "amdgpu_amdkfd.h" +#include "amdgpu_dpm.h" +#include "amdgpu_job.h" +#include "amdgpu_ring.h" +#include "amdgpu_ras.h" +#include "amdgpu_psp.h" +#include "amdgpu_xgmi.h" + +static struct amdgpu_reset_handler * +sienna_cichlid_get_reset_handler(struct amdgpu_reset_control *reset_ctl, + struct amdgpu_reset_context *reset_context) +{ + struct amdgpu_reset_handler *handler; + struct amdgpu_device *adev = (struct amdgpu_device *)reset_ctl->handle; + + if (reset_context->method != AMD_RESET_METHOD_NONE) { +
Re: [PATCH v2 1/6] drm/amdgpu: add mode2 reset for sienna_cichlid
On 2022-07-28 06:30, Victor Zhao wrote: To meet the requirement for multi container usecase which needs a quicker reset and not causing VRAM lost, adding the Mode2 reset handler for sienna_cichlid. v2: move skip mode2 flag part separately Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 7 + drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 297 ++ drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h | 32 ++ .../pm/swsmu/inc/pmfw_if/smu_v11_0_7_ppsmc.h | 4 +- drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 +- .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 54 7 files changed, 395 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index c7d0cd15b5ef..7030ac2d7d2c 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -75,7 +75,7 @@ amdgpu-y += \ vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o vega10_reg_init.o \ vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o mxgpu_nv.o \ nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o soc21.o \ - nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o + sienna_cichlid.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o # add DF block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index 32c86a0b145c..f778466bb9db 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -23,6 +23,7 @@ #include "amdgpu_reset.h" #include "aldebaran.h" +#include "sienna_cichlid.h" int amdgpu_reset_add_handler(struct amdgpu_reset_control *reset_ctl, struct amdgpu_reset_handler *handler) @@ -40,6 +41,9 @@ int amdgpu_reset_init(struct amdgpu_device *adev) case IP_VERSION(13, 0, 2): ret = aldebaran_reset_init(adev); break; + case IP_VERSION(11, 0, 7): + ret = sienna_cichlid_reset_init(adev); + break; default: break; } @@ -55,6 +59,9 @@ int amdgpu_reset_fini(struct amdgpu_device *adev) case IP_VERSION(13, 0, 2): ret = aldebaran_reset_fini(adev); break; + case IP_VERSION(11, 0, 7): + ret = sienna_cichlid_reset_fini(adev); + break; default: break; } diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c new file mode 100644 index ..0512960bed23 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c @@ -0,0 +1,297 @@ +/* + * Copyright 2021 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include "sienna_cichlid.h" +#include "amdgpu_reset.h" +#include "amdgpu_amdkfd.h" +#include "amdgpu_dpm.h" +#include "amdgpu_job.h" +#include "amdgpu_ring.h" +#include "amdgpu_ras.h" +#include "amdgpu_psp.h" +#include "amdgpu_xgmi.h" + +static struct amdgpu_reset_handler * +sienna_cichlid_get_reset_handler(struct amdgpu_reset_control *reset_ctl, + struct amdgpu_reset_context *reset_context) +{ + struct amdgpu_reset_handler *handler; + struct amdgpu_device *adev = (struct amdgpu_device *)reset_ctl->handle; + + if (reset_context->method != AMD_RESET_METHOD_NONE) { + list_for_each_entry(handler, _ctl->reset_handlers, +handler_list) { + if (handler->reset_method == reset_context->method) + return handler;
Re: [PATCH v2 6/6] drm/amdgpu: reduce reset time
On 2022-07-28 06:30, Victor Zhao wrote: In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset v2: add a hang flag to indicate the reset comes from a job timeout, skip ring test and cp halt wait in this case Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 11 +-- 5 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 222d3d7ea076..c735a17c6afb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -27,6 +27,7 @@ #include "amdgpu_gfx.h" #include "amdgpu_rlc.h" #include "amdgpu_ras.h" +#include "amdgpu_reset.h" /* delay 0.1 second to enable gfx off feature */ #define GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100) @@ -477,7 +478,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i], RESET_QUEUES, 0, 0); - if (adev->gfx.kiq.ring.sched.ready) + if (adev->gfx.kiq.ring.sched.ready && !(amdgpu_in_reset(adev) && adev->reset_domain->hang)) I think it's enough to look at adev->reset_domain->hang and you can drop the amdgpu_in_reset check. r = amdgpu_ring_test_helper(kiq_ring); spin_unlock(>gfx.kiq.ring_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 6c3e7290153f..bb40880a557f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -49,6 +49,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) } memset(, 0, sizeof(struct amdgpu_task_info)); + adev->reset_domain->hang = true; if (amdgpu_gpu_recovery && amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { @@ -83,6 +84,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) } exit: + adev->reset_domain->hang = false; drm_dev_exit(idx); return DRM_GPU_SCHED_STAT_NOMINAL; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index 9da5ead50c90..b828fe773f50 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -155,6 +155,7 @@ struct amdgpu_reset_domain *amdgpu_reset_create_reset_domain(enum amdgpu_reset_d atomic_set(_domain->in_gpu_reset, 0); atomic_set(_domain->reset_res, 0); init_rwsem(_domain->sem); + reset_domain->hang = false; return reset_domain; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h index cc4b2eeb24cf..29e324add552 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -84,6 +84,7 @@ struct amdgpu_reset_domain { struct rw_semaphore sem; atomic_t in_gpu_reset; atomic_t reset_res; + bool hang; }; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index fafbad3cf08d..a384e04d916c 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include "amdgpu_gfx.h" #include "amdgpu_psp.h" +#include "amdgpu_reset.h" #include "nv.h" #include "nvd.h" @@ -5971,6 +5972,9 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); } + if ((amdgpu_in_reset(adev) && adev->reset_domain->hang) && !enable) + return 0; + Same as above for (i = 0; i < adev->usec_timeout; i++) { if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) break; @@ -7569,8 +7573,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct amdgpu_device *adev) for (i = 0; i < adev->gfx.num_gfx_rings; i++) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.gfx_ring[i], PREEMPT_QUEUES, 0, 0); - - return amdgpu_ring_test_helper(kiq_ring); + if (!(amdgpu_in_reset(adev) && adev->reset_domain->hang)) Same as above Andrey + return amdgpu_ring_test_helper(kiq_ring); + else + return 0; } #endif @@ -7610,6 +7616,7 @@ static int gfx_v10_0_hw_fini(void *handle) return 0; } + gfx_v10_0_cp_enable(adev, false); gfx_v10_0_enable_gui_idle_interrupt(adev, false);
Re: [PATCH 5/5] drm/amdgpu: reduce reset time
On 2022-07-27 06:35, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Andrey, Problem with status.hang is that it is set at amdgpu_device_ip_check_soft_reset, which is not implemented in nv or gfx10. They have to be nicely implemented first. Another option I thought is to mark status.hang or add a flag to amdgpu_gfx when job timeout reported on gfx/comp ring. And this will require some logic to map the relationship for ring and ip blocks. This way does not look good as well. I don't think we need this at the ring level, its enough to know that the reset you are going through is because of one of rings are hanged to apply this skip logic, it's pretty easy if we add 'bool hang' flag to adev->reset_domain which u can set in the beginning amdgpu_job_timedout and clear in the end. No protection is required as all the resets from all origins are serialized with timeout handler in a single threaded queue. Andrey Thanks, Victor -Original Message- From: Grodzovsky, Andrey Sent: Wednesday, July 27, 2022 12:57 AM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Deng, Emily ; Koenig, Christian Subject: Re: [PATCH 5/5] drm/amdgpu: reduce reset time On 2022-07-26 05:40, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Andrey, Reply inline. Thanks, Victor -Original Message- From: Grodzovsky, Andrey Sent: Tuesday, July 26, 2022 5:18 AM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Deng, Emily ; Koenig, Christian Subject: Re: [PATCH 5/5] drm/amdgpu: reduce reset time On 2022-07-22 03:34, Victor Zhao wrote: In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset Why are they failing in this case ? Skipping ring tests is not the best idea as you loose important indicator of system's sanity. Is there any way to make them work ? [Victor]: I've seen gfx ring test fail every time after a gfx engine hang. I thought it should be expected as gfx is in a bad state. Do you know the reason we have ring tests before reset? As we are going to reset the asic anyway. Another approach could be to make the skip mode2 only or reduce the wait time here. I dug down in history and according to commit 'drm/amdgpu:unmap KCQ in gfx hw_fini(v2)' you need to write to scratch register for completion of queue unmap operation so you defently don't want to just skip it. I agree in case that the ring is hung this has no point but remember that GPU reset can happen not only to a hunged ring but for other reasons (RAS, manual reset e.t.c.) in which case you probably want to shut down gracefully here ? I see we have adev->ip_blocks[i].status.hang flag which you maybe can use here instead ? Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++-- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 222d3d7ea076..f872495ccc3a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i], RESET_QUEUES, 0, 0); - if (adev->gfx.kiq.ring.sched.ready) + if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev)) r = amdgpu_ring_test_helper(kiq_ring); spin_unlock(>gfx.kiq.ring_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index fafbad3cf08d..9ae29023e38f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); } - for (i = 0; i < adev->usec_timeout; i++) { - if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) - break; - udelay(1); - } - - if (i >= adev->usec_timeout) - DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt"); + if (!amdgpu_in_reset(adev)) { + for (i = 0; i < adev->usec_timeout; i++) { + if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) + break; + udelay(1); + } + if (i >= adev->usec_timeout) + DRM_ERROR("failed to %s cp gfx\n", + enable ? "unhalt" : "halt"); + } return 0; + } This change has impact beyond container case no ? We had no issue with this code during regular reset cases so why we would give up on
Re: Crash on resume from S3
The stack trace is expected part of reset procedure so that ok. The issue you are having is a hang on one of GPU jobs during resume which triggers a GPU reset attempt. You can open a ticket with this issue here https://gitlab.freedesktop.org/drm/amd/-/issues, please attach full dmesg log. Andrey On 2022-07-26 05:06, Tom Cook wrote: I have a Ryzen 7 3700U in an HP laptop. lspci describes the GPU in this way: 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] (rev c1) This laptop has never successfully resumed from suspend (I have tried every 5.x kernel). Currently on 5.18.0, the system appears to be okay after resume apart from the gpu which is usually giving a blank screen, occasionally a scrambled output. After rebooting, I see this in syslog: Jul 25 11:02:18 frog kernel: [240782.968674] amdgpu :04:00.0: amdgpu: GPU reset begin! Jul 25 11:02:19 frog kernel: [240783.974891] amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Jul 25 11:02:19 frog kernel: [240783.988650] [drm] free PSP TMR buffer Jul 25 11:02:19 frog kernel: [240784.019057] CPU: 4 PID: 305612 Comm: kworker/u32:17 Not tainted 5.18.0 #1 Jul 25 11:02:19 frog kernel: [240784.019063] Hardware name: HP HP ENVY x360 Convertible 15-ds0xxx/85DD, BIOS F.20 05/28/2020 Jul 25 11:02:19 frog kernel: [240784.019067] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] Jul 25 11:02:19 frog kernel: [240784.019079] Call Trace: Jul 25 11:02:19 frog kernel: [240784.019082] Jul 25 11:02:19 frog kernel: [240784.019085] dump_stack_lvl+0x49/0x5f Jul 25 11:02:19 frog kernel: [240784.019095] dump_stack+0x10/0x12 Jul 25 11:02:19 frog kernel: [240784.019099] amdgpu_do_asic_reset+0x2f/0x4e0 [amdgpu] Jul 25 11:02:19 frog kernel: [240784.019278] amdgpu_device_gpu_recover_imp+0x41e/0xb50 [amdgpu] Jul 25 11:02:19 frog kernel: [240784.019452] amdgpu_job_timedout+0x155/0x1b0 [amdgpu] Jul 25 11:02:19 frog kernel: [240784.019674] drm_sched_job_timedout+0x74/0xf0 [gpu_sched] Jul 25 11:02:19 frog kernel: [240784.019681] ? amdgpu_cgs_destroy_device+0x10/0x10 [amdgpu] Jul 25 11:02:19 frog kernel: [240784.019896] ? drm_sched_job_timedout+0x74/0xf0 [gpu_sched] Jul 25 11:02:19 frog kernel: [240784.019903] process_one_work+0x227/0x440 Jul 25 11:02:19 frog kernel: [240784.019908] worker_thread+0x31/0x3d0 Jul 25 11:02:19 frog kernel: [240784.019912] ? process_one_work+0x440/0x440 Jul 25 11:02:19 frog kernel: [240784.019914] kthread+0xfe/0x130 Jul 25 11:02:19 frog kernel: [240784.019918] ? kthread_complete_and_exit+0x20/0x20 Jul 25 11:02:19 frog kernel: [240784.019923] ret_from_fork+0x22/0x30 Jul 25 11:02:19 frog kernel: [240784.019930] Jul 25 11:02:19 frog kernel: [240784.019934] amdgpu :04:00.0: amdgpu: MODE2 reset Jul 25 11:02:19 frog kernel: [240784.020178] amdgpu :04:00.0: amdgpu: GPU reset succeeded, trying to resume Jul 25 11:02:19 frog kernel: [240784.020552] [drm] PCIE GART of 1024M enabled. Jul 25 11:02:19 frog kernel: [240784.020555] [drm] PTB located at 0x00F40090 Jul 25 11:02:19 frog kernel: [240784.020577] [drm] VRAM is lost due to GPU reset! Jul 25 11:02:19 frog kernel: [240784.020579] [drm] PSP is resuming... Jul 25 11:02:19 frog kernel: [240784.040465] [drm] reserve 0x40 from 0xf47fc0 for PSP TMR I'm running the latest BIOS from HP. Is there anything I can do to work around this? Or anything I can do to help debug it? Regards, Tom Cook
Re: [PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 reset
Got it Acked-by: Andrey Grodzovsky Andrey On 2022-07-26 06:01, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Andrey, For slow tests I mean the slow hang tests by quark tool. An example here: hang_vm_gfx_dispatch_slow.lua - This script runs on a graphics engine using compute engine and has a hacked CS program which is massive and duplicates standard CS program move code hundreds of thousands of times. The effect is a very slowly executing CS program. It's not a bad job but just need a very long time to finish. I suppose we don’t have a way to stop shader here. And the running apps will be affected when reset is done. Thanks, Victor -Original Message- From: Grodzovsky, Andrey Sent: Tuesday, July 26, 2022 5:20 AM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Deng, Emily ; Koenig, Christian Subject: Re: [PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 reset On 2022-07-22 03:34, Victor Zhao wrote: For some hang caused by slow tests, engine cannot be stopped which may cause resume failure after reset. In this case, force halt engine by reverting context addresses Can you maybe explain a bit more what exactly you mean by slow test and why engine cannot be stopped in this case ? Andrey Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h | 1 + drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c| 36 + drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 2 ++ 4 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5498fda8617f..833dc5e224d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5037,6 +5037,7 @@ static void amdgpu_device_recheck_guilty_jobs( /* set guilty */ drm_sched_increase_karma(s_job); + amdgpu_reset_prepare_hwcontext(adev, reset_context); retry: /* do hw reset */ if (amdgpu_sriov_vf(adev)) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h index f8036f2b100e..c7b44aeb671b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h @@ -37,6 +37,7 @@ struct amdgpu_gfxhub_funcs { void (*utcl2_harvest)(struct amdgpu_device *adev); void (*mode2_save_regs)(struct amdgpu_device *adev); void (*mode2_restore_regs)(struct amdgpu_device *adev); + void (*halt)(struct amdgpu_device *adev); }; struct amdgpu_gfxhub { diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c index 51cf8acd2d79..8cf53e039c11 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c @@ -646,6 +646,41 @@ static void gfxhub_v2_1_restore_regs(struct amdgpu_device *adev) WREG32_SOC15(GC, 0, mmGCMC_VM_MX_L1_TLB_CNTL, adev->gmc.MC_VM_MX_L1_TLB_CNTL); } +static void gfxhub_v2_1_halt(struct amdgpu_device *adev) { + struct amdgpu_vmhub *hub = >vmhub[AMDGPU_GFXHUB_0]; + int i; + uint32_t tmp; + int time = 1000; + + gfxhub_v2_1_set_fault_enable_default(adev, false); + + for (i = 0; i <= 14; i++) { + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, + i * hub->ctx_addr_distance, ~0); + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, + i * hub->ctx_addr_distance, ~0); + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_LO32, + i * hub->ctx_addr_distance, + 0); + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_HI32, + i * hub->ctx_addr_distance, + 0); + } + tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2); + while ((tmp & (GRBM_STATUS2__EA_BUSY_MASK | + GRBM_STATUS2__EA_LINK_BUSY_MASK)) != 0 && + time) { + udelay(100); + time--; + tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2); + } + + if (!time) { + DRM_WARN("failed to wait for GRBM(EA) idle\n"); + } +} + const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = { .get_fb_location = gfxhub_v2_1_get_fb_location, .get_mc_fb_offset = gfxhub_v2_1_get_mc_fb_offset, @@ -658,4 +693,5 @@ const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = { .utcl2_harvest = gfxhub_v2_1_utcl2_harvest, .mode2_save_regs = gfxhub_v2_1_save_regs, .mode2_restore_regs = gfxhub_v2_1_restore_regs, +
Re: [PATCH 5/5] drm/amdgpu: reduce reset time
On 2022-07-26 05:40, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Andrey, Reply inline. Thanks, Victor -Original Message- From: Grodzovsky, Andrey Sent: Tuesday, July 26, 2022 5:18 AM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Deng, Emily ; Koenig, Christian Subject: Re: [PATCH 5/5] drm/amdgpu: reduce reset time On 2022-07-22 03:34, Victor Zhao wrote: In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset Why are they failing in this case ? Skipping ring tests is not the best idea as you loose important indicator of system's sanity. Is there any way to make them work ? [Victor]: I've seen gfx ring test fail every time after a gfx engine hang. I thought it should be expected as gfx is in a bad state. Do you know the reason we have ring tests before reset? As we are going to reset the asic anyway. Another approach could be to make the skip mode2 only or reduce the wait time here. I dug down in history and according to commit 'drm/amdgpu:unmap KCQ in gfx hw_fini(v2)' you need to write to scratch register for completion of queue unmap operation so you defently don't want to just skip it. I agree in case that the ring is hung this has no point but remember that GPU reset can happen not only to a hunged ring but for other reasons (RAS, manual reset e.t.c.) in which case you probably want to shut down gracefully here ? I see we have adev->ip_blocks[i].status.hang flag which you maybe can use here instead ? Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++-- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 222d3d7ea076..f872495ccc3a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i], RESET_QUEUES, 0, 0); - if (adev->gfx.kiq.ring.sched.ready) + if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev)) r = amdgpu_ring_test_helper(kiq_ring); spin_unlock(>gfx.kiq.ring_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index fafbad3cf08d..9ae29023e38f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); } - for (i = 0; i < adev->usec_timeout; i++) { - if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) - break; - udelay(1); - } - - if (i >= adev->usec_timeout) - DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt"); + if (!amdgpu_in_reset(adev)) { + for (i = 0; i < adev->usec_timeout; i++) { + if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) + break; + udelay(1); + } + if (i >= adev->usec_timeout) + DRM_ERROR("failed to %s cp gfx\n", + enable ? "unhalt" : "halt"); + } return 0; + } This change has impact beyond container case no ? We had no issue with this code during regular reset cases so why we would give up on this code which confirms CP is idle ? What is the side effect of skipping this during all GPU resets ? Andrey [Victor]: I see "failed to halt cp gfx" with regular reset cases as well when doing a gfx hang test using quark. I haven't seen a side effect with Mode1 reset yet but maybe shorten the wait time could be better? Same as above i guess, it would indeed time out for a hung ring but GPU reset happens not only because of hung rings but for other reasons. Andrey static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device *adev) @@ -7569,8 +7572,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct amdgpu_device *adev) for (i = 0; i < adev->gfx.num_gfx_rings; i++) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.gfx_ring[i], PREEMPT_QUEUES, 0, 0); - - return amdgpu_ring_test_helper(kiq_ring); + if (!amdgpu_in_reset(adev)) + return amdgpu_ring_test_helper(kiq_ring); + else + return 0; } #endif @@ -7610,6 +7615,7 @@ static int gfx_v10_0_hw_fini(void *handle) return 0; } + gfx_v10_0_cp_enable(adev, false);
Re: [PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 reset
On 2022-07-22 03:34, Victor Zhao wrote: For some hang caused by slow tests, engine cannot be stopped which may cause resume failure after reset. In this case, force halt engine by reverting context addresses Can you maybe explain a bit more what exactly you mean by slow test and why engine cannot be stopped in this case ? Andrey Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h | 1 + drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c| 36 + drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 2 ++ 4 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5498fda8617f..833dc5e224d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5037,6 +5037,7 @@ static void amdgpu_device_recheck_guilty_jobs( /* set guilty */ drm_sched_increase_karma(s_job); + amdgpu_reset_prepare_hwcontext(adev, reset_context); retry: /* do hw reset */ if (amdgpu_sriov_vf(adev)) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h index f8036f2b100e..c7b44aeb671b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h @@ -37,6 +37,7 @@ struct amdgpu_gfxhub_funcs { void (*utcl2_harvest)(struct amdgpu_device *adev); void (*mode2_save_regs)(struct amdgpu_device *adev); void (*mode2_restore_regs)(struct amdgpu_device *adev); + void (*halt)(struct amdgpu_device *adev); }; struct amdgpu_gfxhub { diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c index 51cf8acd2d79..8cf53e039c11 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c @@ -646,6 +646,41 @@ static void gfxhub_v2_1_restore_regs(struct amdgpu_device *adev) WREG32_SOC15(GC, 0, mmGCMC_VM_MX_L1_TLB_CNTL, adev->gmc.MC_VM_MX_L1_TLB_CNTL); } +static void gfxhub_v2_1_halt(struct amdgpu_device *adev) +{ + struct amdgpu_vmhub *hub = >vmhub[AMDGPU_GFXHUB_0]; + int i; + uint32_t tmp; + int time = 1000; + + gfxhub_v2_1_set_fault_enable_default(adev, false); + + for (i = 0; i <= 14; i++) { + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, + i * hub->ctx_addr_distance, ~0); + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, + i * hub->ctx_addr_distance, ~0); + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_LO32, + i * hub->ctx_addr_distance, + 0); + WREG32_SOC15_OFFSET(GC, 0, mmGCVM_CONTEXT1_PAGE_TABLE_END_ADDR_HI32, + i * hub->ctx_addr_distance, + 0); + } + tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2); + while ((tmp & (GRBM_STATUS2__EA_BUSY_MASK | + GRBM_STATUS2__EA_LINK_BUSY_MASK)) != 0 && + time) { + udelay(100); + time--; + tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS2); + } + + if (!time) { + DRM_WARN("failed to wait for GRBM(EA) idle\n"); + } +} + const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = { .get_fb_location = gfxhub_v2_1_get_fb_location, .get_mc_fb_offset = gfxhub_v2_1_get_mc_fb_offset, @@ -658,4 +693,5 @@ const struct amdgpu_gfxhub_funcs gfxhub_v2_1_funcs = { .utcl2_harvest = gfxhub_v2_1_utcl2_harvest, .mode2_save_regs = gfxhub_v2_1_save_regs, .mode2_restore_regs = gfxhub_v2_1_restore_regs, + .halt = gfxhub_v2_1_halt, }; diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c index 51a5b68f77d3..fead7251292f 100644 --- a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c +++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c @@ -97,6 +97,8 @@ sienna_cichlid_mode2_prepare_hwcontext(struct amdgpu_reset_control *reset_ctl, if (!amdgpu_sriov_vf(adev)) { if (adev->gfxhub.funcs->mode2_save_regs) adev->gfxhub.funcs->mode2_save_regs(adev); + if (adev->gfxhub.funcs->halt) + adev->gfxhub.funcs->halt(adev); r = sienna_cichlid_mode2_suspend_ip(adev); }
Re: [PATCH 3/5] drm/amdgpu: save and restore gc hub regs
Acked-by: Andrey Grodzovsky Andrey On 2022-07-22 03:34, Victor Zhao wrote: Save and restore gfxhub regs as they will be reset during mode 2 Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 26 +++ drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 72 +++ drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 7 +- .../include/asic_reg/gc/gc_10_3_0_offset.h| 4 ++ 5 files changed, 110 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h index beabab515836..f8036f2b100e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h @@ -35,6 +35,8 @@ struct amdgpu_gfxhub_funcs { void (*init)(struct amdgpu_device *adev); int (*get_xgmi_info)(struct amdgpu_device *adev); void (*utcl2_harvest)(struct amdgpu_device *adev); + void (*mode2_save_regs)(struct amdgpu_device *adev); + void (*mode2_restore_regs)(struct amdgpu_device *adev); }; struct amdgpu_gfxhub { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 008eaca27151..0305b660cd17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -264,6 +264,32 @@ struct amdgpu_gmc { u64 mall_size; /* number of UMC instances */ int num_umc; + /* mode2 save restore */ + u64 VM_L2_CNTL; + u64 VM_L2_CNTL2; + u64 VM_DUMMY_PAGE_FAULT_CNTL; + u64 VM_DUMMY_PAGE_FAULT_ADDR_LO32; + u64 VM_DUMMY_PAGE_FAULT_ADDR_HI32; + u64 VM_L2_PROTECTION_FAULT_CNTL; + u64 VM_L2_PROTECTION_FAULT_CNTL2; + u64 VM_L2_PROTECTION_FAULT_MM_CNTL3; + u64 VM_L2_PROTECTION_FAULT_MM_CNTL4; + u64 VM_L2_PROTECTION_FAULT_ADDR_LO32; + u64 VM_L2_PROTECTION_FAULT_ADDR_HI32; + u64 VM_DEBUG; + u64 VM_L2_MM_GROUP_RT_CLASSES; + u64 VM_L2_BANK_SELECT_RESERVED_CID; + u64 VM_L2_BANK_SELECT_RESERVED_CID2; + u64 VM_L2_CACHE_PARITY_CNTL; + u64 VM_L2_IH_LOG_CNTL; + u64 VM_CONTEXT_CNTL[16]; + u64 VM_CONTEXT_PAGE_TABLE_BASE_ADDR_LO32[16]; + u64 VM_CONTEXT_PAGE_TABLE_BASE_ADDR_HI32[16]; + u64 VM_CONTEXT_PAGE_TABLE_START_ADDR_LO32[16]; + u64 VM_CONTEXT_PAGE_TABLE_START_ADDR_HI32[16]; + u64 VM_CONTEXT_PAGE_TABLE_END_ADDR_LO32[16]; + u64 VM_CONTEXT_PAGE_TABLE_END_ADDR_HI32[16]; + u64 MC_VM_MX_L1_TLB_CNTL; }; #define amdgpu_gmc_flush_gpu_tlb(adev, vmid, vmhub, type) ((adev)->gmc.gmc_funcs->flush_gpu_tlb((adev), (vmid), (vmhub), (type))) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c index d8c531581116..51cf8acd2d79 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c @@ -576,6 +576,76 @@ static void gfxhub_v2_1_utcl2_harvest(struct amdgpu_device *adev) } } +static void gfxhub_v2_1_save_regs(struct amdgpu_device *adev) +{ + int i; + adev->gmc.VM_L2_CNTL = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL); + adev->gmc.VM_L2_CNTL2 = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL2); + adev->gmc.VM_DUMMY_PAGE_FAULT_CNTL = RREG32_SOC15(GC, 0, mmGCVM_DUMMY_PAGE_FAULT_CNTL); + adev->gmc.VM_DUMMY_PAGE_FAULT_ADDR_LO32 = RREG32_SOC15(GC, 0, mmGCVM_DUMMY_PAGE_FAULT_ADDR_LO32); + adev->gmc.VM_DUMMY_PAGE_FAULT_ADDR_HI32 = RREG32_SOC15(GC, 0, mmGCVM_DUMMY_PAGE_FAULT_ADDR_HI32); + adev->gmc.VM_L2_PROTECTION_FAULT_CNTL = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL); + adev->gmc.VM_L2_PROTECTION_FAULT_CNTL2 = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL2); + adev->gmc.VM_L2_PROTECTION_FAULT_MM_CNTL3 = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_MM_CNTL3); + adev->gmc.VM_L2_PROTECTION_FAULT_MM_CNTL4 = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_MM_CNTL4); + adev->gmc.VM_L2_PROTECTION_FAULT_ADDR_LO32 = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_ADDR_LO32); + adev->gmc.VM_L2_PROTECTION_FAULT_ADDR_HI32 = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_ADDR_HI32); + adev->gmc.VM_DEBUG = RREG32_SOC15(GC, 0, mmGCVM_DEBUG); + adev->gmc.VM_L2_MM_GROUP_RT_CLASSES = RREG32_SOC15(GC, 0, mmGCVM_L2_MM_GROUP_RT_CLASSES); + adev->gmc.VM_L2_BANK_SELECT_RESERVED_CID = RREG32_SOC15(GC, 0, mmGCVM_L2_BANK_SELECT_RESERVED_CID); + adev->gmc.VM_L2_BANK_SELECT_RESERVED_CID2 = RREG32_SOC15(GC, 0, mmGCVM_L2_BANK_SELECT_RESERVED_CID2); + adev->gmc.VM_L2_CACHE_PARITY_CNTL = RREG32_SOC15(GC, 0, mmGCVM_L2_CACHE_PARITY_CNTL); + adev->gmc.VM_L2_IH_LOG_CNTL = RREG32_SOC15(GC, 0, mmGCVM_L2_IH_LOG_CNTL); + + for (i = 0; i <= 15; i++) { + adev->gmc.VM_CONTEXT_CNTL[i] = RREG32_SOC1
Re: [PATCH 5/5] drm/amdgpu: reduce reset time
On 2022-07-22 03:34, Victor Zhao wrote: In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset Why are they failing in this case ? Skipping ring tests is not the best idea as you loose important indicator of system's sanity. Is there any way to make them work ? Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++-- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 222d3d7ea076..f872495ccc3a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.compute_ring[i], RESET_QUEUES, 0, 0); - if (adev->gfx.kiq.ring.sched.ready) + if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev)) r = amdgpu_ring_test_helper(kiq_ring); spin_unlock(>gfx.kiq.ring_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index fafbad3cf08d..9ae29023e38f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); } - for (i = 0; i < adev->usec_timeout; i++) { - if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) - break; - udelay(1); - } - - if (i >= adev->usec_timeout) - DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt"); + if (!amdgpu_in_reset(adev)) { + for (i = 0; i < adev->usec_timeout; i++) { + if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0) + break; + udelay(1); + } + if (i >= adev->usec_timeout) + DRM_ERROR("failed to %s cp gfx\n", + enable ? "unhalt" : "halt"); + } return 0; + } This change has impact beyond container case no ? We had no issue with this code during regular reset cases so why we would give up on this code which confirms CP is idle ? What is the side effect of skipping this during all GPU resets ? Andrey static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device *adev) @@ -7569,8 +7572,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct amdgpu_device *adev) for (i = 0; i < adev->gfx.num_gfx_rings; i++) kiq->pmf->kiq_unmap_queues(kiq_ring, >gfx.gfx_ring[i], PREEMPT_QUEUES, 0, 0); - - return amdgpu_ring_test_helper(kiq_ring); + if (!amdgpu_in_reset(adev)) + return amdgpu_ring_test_helper(kiq_ring); + else + return 0; } #endif @@ -7610,6 +7615,7 @@ static int gfx_v10_0_hw_fini(void *handle) return 0; } + gfx_v10_0_cp_enable(adev, false); gfx_v10_0_enable_gui_idle_interrupt(adev, false);
Re: [PATCH 2/5] drm/amdgpu: add debugfs amdgpu_reset_level
On 2022-07-25 13:37, Christian König wrote: Hi Victor, Am 25.07.22 um 12:45 schrieb Zhao, Victor: [AMD Official Use Only - General] Hi @Grodzovsky, Andrey, Please help review the series, thanks a lot. Hi @Koenig, Christian, I thought a module parameter will be exposed to a common user, this control was added to help debug and test so I put it as a debugfs. I can make it module parameter if more appropriate. That's a really good argument. I leave the decision if we should use a module parameter or debugfs file to Andrey. If you go with debugfs then using the debugfs_create_u32() or debugfs_create_u64() function would be more appropriate I think. And then don't make that global, but rather a per device flag. Regards, Christian. Makes sense to me too. Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Friday, July 22, 2022 4:20 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Deucher, Alexander Subject: Re: [PATCH 2/5] drm/amdgpu: add debugfs amdgpu_reset_level Well NAK to the debugfs approach, stuff like that is usually a module parameter. Apart from that this series needs to be reviewed by Andrey. Regards, Christian. Am 22.07.22 um 09:34 schrieb Victor Zhao: Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++ 5 files changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6cd1e0a6dffc..c661231a6a07 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -238,6 +238,7 @@ extern int amdgpu_si_support; extern int amdgpu_cik_support; #endif extern int amdgpu_num_kcq; +extern uint amdgpu_reset_level_mask; #define AMDGPU_VCNFW_LOG_SIZE (32 * 1024) extern int amdgpu_vcnfw_log; @@ -274,6 +275,9 @@ extern int amdgpu_vcnfw_log; #define AMDGPU_RESET_VCE (1 << 13) #define AMDGPU_RESET_VCE1 (1 << 14) +#define AMDGPU_RESET_LEVEL_SOFT_RECOVERY (1 << 0) #define +AMDGPU_RESET_LEVEL_MODE2 (1 << 1) + /* max cursor sizes (in pixels) */ #define CIK_CURSOR_WIDTH 128 #define CIK_CURSOR_HEIGHT 128 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index e2eec985adb3..235c48e4ba4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1661,12 +1661,29 @@ static int amdgpu_debugfs_sclk_set(void *data, u64 val) return ret; } +static int amdgpu_debugfs_reset_level_get(void *data, u64 *val) { + struct amdgpu_device *adev = (struct amdgpu_device *)data; + *val = amdgpu_reset_level_mask; + return 0; +} + +static int amdgpu_debugfs_reset_level_set(void *data, u64 val) { + struct amdgpu_device *adev = (struct amdgpu_device *)data; + amdgpu_reset_level_mask = val; + return 0; +} + DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL, amdgpu_debugfs_ib_preempt, "%llu\n"); DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL, amdgpu_debugfs_sclk_set, "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(fops_reset_level, amdgpu_debugfs_reset_level_get, + amdgpu_debugfs_reset_level_set, "%llu\n"); + static ssize_t amdgpu_reset_dump_register_list_read(struct file *f, char __user *buf, size_t size, loff_t *pos) { @@ -1785,6 +1802,9 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev) return PTR_ERR(ent); } + debugfs_create_file("amdgpu_reset_level", 0200, root, adev, + _reset_level); + /* Register debugfs entries for amdgpu_ttm */ amdgpu_ttm_debugfs_init(adev); amdgpu_debugfs_pm_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e8c6c3fe9374..fb8f3cb853a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -198,6 +198,7 @@ struct amdgpu_watchdog_timer amdgpu_watchdog_timer = { .timeout_fatal_disable = false, .period = 0x0, /* default to 0x0 (timeout disable) */ }; +uint amdgpu_reset_level_mask = 0x1; /** * DOC: vramlimit (int) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index 831fb222139c..f16ab1a54b70 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -74,6 +74,9 @@ int amdgpu_reset_prepare_hwcontext(struct amdgpu_device *adev, {
Re: [PATCH 1/5] drm/amdgpu: add mode2 reset for sienna_cichlid
On 2022-07-22 03:33, Victor Zhao wrote: To meet the requirement for multi container usecase which needs a quicker reset and not causing VRAM lost, adding the Mode2 reset handler for sienna_cichlid. Adding a AMDGPU_SKIP_MODE2_RESET flag so driver can fallback to default reset method when mode2 reset failed and retry. - add mode2 reset handler for sienna_cichlid Seems to me ASIC specific changes should be in a seperate patch - introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 1 + drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 297 ++ drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h | 32 ++ .../pm/swsmu/inc/pmfw_if/smu_v11_0_7_ppsmc.h | 4 +- drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 +- .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 54 15 files changed, 414 insertions(+), 5 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c create mode 100644 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index c7d0cd15b5ef..7030ac2d7d2c 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -75,7 +75,7 @@ amdgpu-y += \ vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o vega10_reg_init.o \ vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o mxgpu_nv.o \ nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o soc21.o \ - nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o + sienna_cichlid.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o # add DF block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..091415a4abf0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -135,6 +135,7 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work) reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags); + clear_bit(AMDGPU_SKIP_MODE2_RESET, _context.flags); amdgpu_device_gpu_recover(adev, NULL, _context); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b79ee4ffb879..5498fda8617f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5146,6 +5146,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, reset_context->job = job; reset_context->hive = hive; + /* * Build list of devices to reset. * In case we are in XGMI hive mode, resort the device list @@ -5265,8 +5266,11 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, amdgpu_ras_resume(adev); } else { r = amdgpu_do_asic_reset(device_list_handle, reset_context); - if (r && r == -EAGAIN) + if (r && r == -EAGAIN) { + set_bit(AMDGPU_SKIP_MODE2_RESET, _context->flags); + adev->asic_reset_res = 0; See my question bellow related to this set goto retry; + } } skip_hw_reset: @@ -5694,6 +5698,7 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) reset_context.reset_req_dev = adev; set_bit(AMDGPU_NEED_FULL_RESET, _context.flags); set_bit(AMDGPU_SKIP_HW_RESET, _context.flags); + set_bit(AMDGPU_SKIP_MODE2_RESET, _context.flags); adev->no_hw_access = true; r = amdgpu_device_pre_asic_reset(adev, _context); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 10fdd12cf853..9844d99075e9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -71,6 +71,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags); + clear_bit(AMDGPU_SKIP_MODE2_RESET, _context.flags); r =
Re: [PATCH] drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()
Reviewed-by: Andrey Grodzovsky Andrey On 2022-07-19 06:39, Andrey Strachuk wrote: Local variable 'rq' is initialized by an address of field of drm_sched_job, so it does not make sense to compare 'rq' with NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Strachuk Fixes: 7c6e68c777f1 ("drm/amdgpu: Avoid HW GPU reset for RAS.") --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 67f66f2f1809..600401f2a98f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -285,10 +285,6 @@ void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched) /* Signal all jobs not yet scheduled */ for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { struct drm_sched_rq *rq = >sched_rq[i]; - - if (!rq) - continue; - spin_lock(>lock); list_for_each_entry(s_entity, >entities, list) { while ((s_job = to_drm_sched_job(spsc_queue_pop(_entity->job_queue {
Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails
On 2022-07-15 05:28, Zhu, Jiadong wrote: [AMD Official Use Only - General] Updated some comments -Original Message- From: Zhu, Jiadong Sent: Friday, July 15, 2022 5:13 PM To: Christian König ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Huang, Ray ; Liu, Aaron Subject: RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails Hi Christian, The resubmitted job in function amdgpu_ib_preempt_job_recovery returns the same hw fence because of this commit: static void amdgpu_ib_preempt_job_recovery(struct drm_gpu_scheduler *sched) { struct drm_sched_job *s_job; struct dma_fence *fence; spin_lock(>job_list_lock); list_for_each_entry(s_job, >pending_list, list) { fence = sched->ops->run_job(s_job); //fence returned has the same address with swapped fences dma_fence_put(fence); } spin_unlock(>job_list_lock); } commit c530b02f39850a639b72d01ebbf7e5d745c60831 Author: Jack Zhang Date: Wed May 12 15:06:35 2021 +0800 drm/amd/amdgpu embed hw_fence into amdgpu_job Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang Reviewed-by: Andrey Grodzovsky Reviewed by: Monk Liu Signed-off-by: Alex Deucher Thus the fence we swapped out is signaled and put twice in the following 2 functions and we get " refcount_t: underflow; use-after-free. " errors latter. /* wait for jobs finished */ amdgpu_fence_wait_empty(ring); //wait on the resubmitted fence which is signaled and put somewhere else. The refcount decreased by 1 after amdgpu_fence_wait_empty. /* signal the old fences */ amdgpu_ib_preempt_signal_fences(fences, length); //signal and put the previous swapped fence, signal would return -22. Thanks, Jiadong Did you have 'drm/amdgpu: Follow up change to previous drm scheduler change.' this commit in your branch while you encountered this problem ? I don't see an underflow issue for the preempted job when inspecting the code with this commit in mind - amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 drm_sched_main s_fence->parent = dma_fence_get(fence); 4 dma_fence_put(fence); 3 amdgpu_ib_preempt_job_recovery amdgpu_fence_emit if (job && job->job_run_counter) -> dma_fence_get(fence); 4 rcu_assign_pointer(*ptr, dma_fence_get(fence)); 5 dma_fence_put(fence); 4 amdgpu_fence_wait_empty dma_fence_get_rcu(fence) 5 dma_fence_put(fence) 4 amdgpu_process_fence (EOP interrupt for re-submission of preempted job) dma_fence_put 3 amdgpu_ib_preempt_signal_fences dma_fence_put 2 amdgpu_job_free_cb dma_fence_put(>hw_fence) 1 drm_sched_fence_release_scheduled dma_fence_put(fence->parent); 0 Also take a look here for reference - https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view Andrey Andrey -Original Message- From: Christian König Sent: Friday, July 15, 2022 4:48 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Huang, Ray ; Liu, Aaron Subject: Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails [CAUTION: External Email] Am 15.07.22 um 10:43 schrieb jiadong@amd.com: From: "Jiadong.Zhu" Dma_fence_signal returning non-zero indicates that the fence is signaled and put somewhere else. Skip dma_fence_put to make the fence refcount correct. Well quite a big NAK on this. Reference counting should be completely independent where a fence signals. Andrey can you take a look at this as well? Thanks, Christian. Signed-off-by: Jiadong.Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index f4ed0785d523..93c1a5e83835 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd
Re: [PATCH 10/10] drm/amdgpu: add gang submit frontend v2
Acked-by: Andrey Grodzovsky Andrey On 2022-07-14 06:39, Christian König wrote: Allows submitting jobs as gang which needs to run on multiple engines at the same time. All members of the gang get the same implicit, explicit and VM dependencies. So no gang member will start running until everything else is ready. The last job is considered the gang leader (usually a submission to the GFX ring) and used for signaling output dependencies. Each job is remembered individually as user of a buffer object, so there is no joining of work at the end. v2: rebase and fix review comments from Andrey and Yogesh Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 256 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h| 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 12 +- 3 files changed, 183 insertions(+), 95 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 88f491dc7ca2..e1c41db20efb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -69,6 +69,7 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, unsigned int *num_ibs) { struct drm_sched_entity *entity; + unsigned int i; int r; r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, @@ -77,17 +78,28 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, if (r) return r; - /* Abort if there is no run queue associated with this entity. -* Possibly because of disabled HW IP*/ + /* +* Abort if there is no run queue associated with this entity. +* Possibly because of disabled HW IP. +*/ if (entity->rq == NULL) return -EINVAL; - /* Currently we don't support submitting to multiple entities */ - if (p->entity && p->entity != entity) + /* Check if we can add this IB to some existing job */ + for (i = 0; i < p->gang_size; ++i) { + if (p->entities[i] == entity) + goto found; + } + + /* If not increase the gang size if possible */ + if (i == AMDGPU_CS_GANG_SIZE) return -EINVAL; - p->entity = entity; - ++(*num_ibs); + p->entities[i] = entity; + p->gang_size = i + 1; + +found: + ++(num_ibs[i]); return 0; } @@ -161,11 +173,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, union drm_amdgpu_cs *cs) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; + unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { }; struct amdgpu_vm *vm = >vm; uint64_t *chunk_array_user; uint64_t *chunk_array; - unsigned size, num_ibs = 0; uint32_t uf_offset = 0; + unsigned int size; int ret; int i; @@ -231,7 +244,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, if (size < sizeof(struct drm_amdgpu_cs_chunk_ib)) goto free_partial_kdata; - ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, _ibs); + ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs); if (ret) goto free_partial_kdata; break; @@ -268,21 +281,28 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, } } - ret = amdgpu_job_alloc(p->adev, num_ibs, >job, vm); - if (ret) - goto free_all_kdata; + if (!p->gang_size) + return -EINVAL; - ret = drm_sched_job_init(>job->base, p->entity, >vm); - if (ret) - goto free_all_kdata; + for (i = 0; i < p->gang_size; ++i) { + ret = amdgpu_job_alloc(p->adev, num_ibs[i], >jobs[i], vm); + if (ret) + goto free_all_kdata; + + ret = drm_sched_job_init(>jobs[i]->base, p->entities[i], +>vm); + if (ret) + goto free_all_kdata; + } + p->gang_leader = p->jobs[p->gang_size - 1]; - if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) { + if (p->ctx->vram_lost_counter != p->gang_leader->vram_lost_counter) { ret = -ECANCELED; goto free_all_kdata; } if (p->uf_entry.tv.bo) - p->job->uf_addr = uf_offset; + p->gang_leader->uf_addr = uf_offset; kvfree(chunk_array); /* Use this opportunity to fill in task info for the vm */ @@ -304,22 +324,18 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, return ret; } -static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, -
Re: [PATCH 07/10] drm/amdgpu: move setting the job resources
Reviewed-by: Andrey Grodzovsky Andrey On 2022-07-14 06:38, Christian König wrote: Move setting the job resources into amdgpu_job.c Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 21 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 ++ 3 files changed, 21 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index dfb7b4f46bc3..88f491dc7ca2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -828,9 +828,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_vm *vm = >vm; struct amdgpu_bo_list_entry *e; struct list_head duplicates; - struct amdgpu_bo *gds; - struct amdgpu_bo *gws; - struct amdgpu_bo *oa; int r; INIT_LIST_HEAD(>validated); @@ -947,22 +944,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved, p->bytes_moved_vis); - gds = p->bo_list->gds_obj; - gws = p->bo_list->gws_obj; - oa = p->bo_list->oa_obj; - - if (gds) { - p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT; - p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT; - } - if (gws) { - p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT; - p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT; - } - if (oa) { - p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT; - p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT; - } + amdgpu_job_set_resources(p->job, p->bo_list->gds_obj, +p->bo_list->gws_obj, p->bo_list->oa_obj); if (p->uf_entry.tv.bo) { struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 36c1be77bf8f..3255b2fca611 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -129,6 +129,23 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, return r; } +void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds, + struct amdgpu_bo *gws, struct amdgpu_bo *oa) +{ + if (gds) { + job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT; + job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT; + } + if (gws) { + job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT; + job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT; + } + if (oa) { + job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT; + job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT; + } +} + void amdgpu_job_free_resources(struct amdgpu_job *job) { struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h index d599c0540b46..0bab8fe0d419 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h @@ -77,6 +77,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, struct amdgpu_job **job, struct amdgpu_vm *vm); int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, enum amdgpu_ib_pool_type pool, struct amdgpu_job **job); +void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds, + struct amdgpu_bo *gws, struct amdgpu_bo *oa); void amdgpu_job_free_resources(struct amdgpu_job *job); void amdgpu_job_free(struct amdgpu_job *job); int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
Re: [PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq
Found the new use case from the 5/10 of reordering CS ioctl. Reviewed-by: Andrey Grodzovsky Andrey On 2022-07-14 12:26, Christian König wrote: We need this for limiting codecs like AV1 to the first instance for VCN3. Essentially the idea is that we first initialize the job with entity, id etc... and before we submit it we select a new rq for the entity. In the meantime the VCN3 inline parse will have modified the available rqs for the entity. See the patch "revert "fix limiting AV1 to the first instance on VCN3"" as well. Christian. Am 14.07.22 um 17:43 schrieb Andrey Grodzovsky: Can you please remind me of the use case that requires this ? I browsed through related mails in the past but haven't found when is that needed. For amdgpu drm_sched_job_init and drm_sched_job_arm are called together and amdgpu is the only one who supports modifying entity priority on the fly as far as i see. Andrey On 2022-07-14 06:38, Christian König wrote: We already discussed that the call to drm_sched_entity_select_rq() needs to move to drm_sched_job_arm() to be able to set a new scheduler list between _init() and _arm(). This was just not applied for some reason. Signed-off-by: Christian König CC: Andrey Grodzovsky CC: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/scheduler/sched_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..e0ab14e0fb6b 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -592,7 +592,6 @@ int drm_sched_job_init(struct drm_sched_job *job, struct drm_sched_entity *entity, void *owner) { - drm_sched_entity_select_rq(entity); if (!entity->rq) return -ENOENT; @@ -628,7 +627,7 @@ void drm_sched_job_arm(struct drm_sched_job *job) struct drm_sched_entity *entity = job->entity; BUG_ON(!entity); - + drm_sched_entity_select_rq(entity); sched = entity->rq->sched; job->sched = sched;
Re: [PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq
Can you please remind me of the use case that requires this ? I browsed through related mails in the past but haven't found when is that needed. For amdgpu drm_sched_job_init and drm_sched_job_arm are called together and amdgpu is the only one who supports modifying entity priority on the fly as far as i see. Andrey On 2022-07-14 06:38, Christian König wrote: We already discussed that the call to drm_sched_entity_select_rq() needs to move to drm_sched_job_arm() to be able to set a new scheduler list between _init() and _arm(). This was just not applied for some reason. Signed-off-by: Christian König CC: Andrey Grodzovsky CC: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/scheduler/sched_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..e0ab14e0fb6b 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -592,7 +592,6 @@ int drm_sched_job_init(struct drm_sched_job *job, struct drm_sched_entity *entity, void *owner) { - drm_sched_entity_select_rq(entity); if (!entity->rq) return -ENOENT; @@ -628,7 +627,7 @@ void drm_sched_job_arm(struct drm_sched_job *job) struct drm_sched_entity *entity = job->entity; BUG_ON(!entity); - + drm_sched_entity_select_rq(entity); sched = entity->rq->sched; job->sched = sched;
Re: [PATCH 09/10] drm/amdgpu: add gang submit backend
On 2022-07-14 06:39, Christian König wrote: Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 28 -- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h| 3 ++ 4 files changed, 66 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2871a3e3801f..19308db52984 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -881,6 +881,7 @@ struct amdgpu_device { u64 fence_context; unsignednum_rings; struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; + struct dma_fence __rcu *gang_submit; boolib_pool_ready; struct amdgpu_sa_managerib_pools[AMDGPU_IB_POOL_MAX]; struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; @@ -1288,6 +1289,8 @@ u32 amdgpu_device_pcie_port_rreg(struct amdgpu_device *adev, u32 reg); void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev, u32 reg, u32 v); +struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev, + struct dma_fence *gang); /* atpx handler */ #if defined(CONFIG_VGA_SWITCHEROO) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e1c9587f659b..f80beb7208c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3499,6 +3499,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, adev->gmc.gart_size = 512 * 1024 * 1024; adev->accel_working = false; adev->num_rings = 0; + RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub()); adev->mman.buffer_funcs = NULL; adev->mman.buffer_funcs_ring = NULL; adev->vm_manager.vm_pte_funcs = NULL; @@ -3979,6 +3980,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev) release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL; adev->accel_working = false; + dma_fence_put(rcu_dereference_protected(adev->gang_submit, true)); amdgpu_reset_fini(adev); @@ -5905,3 +5907,35 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev, (void)RREG32(data); spin_unlock_irqrestore(>pcie_idx_lock, flags); } + +/** + * amdgpu_device_switch_gang - switch to a new gang + * @adev: amdgpu_device pointer + * @gang: the gang to switch to + * + * Try to switch to a new gang or return a reference to the current gang if that + * isn't possible. + * Returns: Either NULL if we switched correctly or a reference to the existing + * gang. + */ +struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev, + struct dma_fence *gang) +{ + struct dma_fence *old = NULL; + + do { + dma_fence_put(old); + old = dma_fence_get_rcu_safe(>gang_submit); + + if (old == gang) + break; + + if (!dma_fence_is_signaled(old)) + return old; + + } while (cmpxchg((struct dma_fence __force **)>gang_submit, +old, gang) != old); + + dma_fence_put(old); + return NULL; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 3255b2fca611..f3a1fdbd41a3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -180,11 +180,29 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) kfree(job); } +void amdgpu_job_set_gang_leader(struct amdgpu_job *job, + struct amdgpu_job *leader) +{ + struct dma_fence *fence = >base.s_fence->scheduled; + + WARN_ON(job->gang_submit); + + /* +* Don't add a reference when we are the gang leader to avoid circle +* dependency. +*/ + if (job != leader) + dma_fence_get(fence); + job->gang_submit = fence; +} + void amdgpu_job_free(struct amdgpu_job *job) { amdgpu_job_free_resources(job); amdgpu_sync_free(>sync); amdgpu_sync_free(>sched_sync); + if (job->gang_submit != >base.s_fence->scheduled) +
Re: [PATCH] drm/amdgpu: Get rid of amdgpu_job->external_hw_fence
On 2022-07-13 13:33, Christian König wrote: Am 13.07.22 um 19:13 schrieb Andrey Grodzovsky: This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (*sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzov...@amd.com/ Signed-off-by: Andrey Grodzovsky Suggested-by: Christian König Of hand that looks correct to me, but could be that I'm missing something as well. Anyway I think I can give an Reviewed-by: Christian König for this. Thanks, Christian. Pushed, thanks. Andrey --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 -- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 1 - 3 files changed, 6 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 16faea7ed1cd..b79ee4ffb879 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5229,8 +5229,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * * job->base holds a reference to parent fence */ - if (job && (job->hw_fence.ops != NULL) && - dma_fence_is_signaled(>hw_fence)) { + if (job && dma_fence_is_signaled(>hw_fence)) { job_signaled = true; dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); goto skip_hw_reset; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 6fa381ee5fa0..10fdd12cf853 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -134,16 +134,10 @@ void amdgpu_job_free_resources(struct amdgpu_job *job) { struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); struct dma_fence *f; - struct dma_fence *hw_fence; unsigned i; - if (job->hw_fence.ops == NULL) - hw_fence = job->external_hw_fence; - else - hw_fence = >hw_fence; - /* use sched fence if available */ - f = job->base.s_fence ? >base.s_fence->finished : hw_fence; + f = job->base.s_fence ? >base.s_fence->finished : >hw_fence; for (i = 0; i < job->num_ibs; ++i) amdgpu_ib_free(ring->adev, >ibs[i], f); } @@ -157,11 +151,7 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) amdgpu_sync_free(>sync); amdgpu_sync_free(>sched_sync); - /* only put the hw fence if has embedded fence */ - if (job->hw_fence.ops != NULL) - dma_fence_put(>hw_fence); - else - kfree(job); + dma_fence_put(>hw_fence); } void amdgpu_job_free(struct amdgpu_job *job) @@ -170,11 +160,7 @@ void amdgpu_job_free(struct amdgpu_job *job) amdgpu_sync_free(>sync); amdgpu_sync_free(>sched_sync); - /* only put the hw fence if has embedded fence */ - if (job->hw_fence.ops != NULL) - dma_fence_put(>hw_fence); - else - kfree(job); + dma_fence_put(>hw_fence); } int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity, @@ -204,15 +190,12 @@ int amdgpu_job_submit_direct(struct amdgpu_job *job, struct amdgpu_ring *ring, int r; job->base.sched = >sched; - r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, NULL, fence); - /* record external_hw_fence for direct submit */ - job->external_hw_fence = dma_fence_get(*fence); + r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job, fence); + if (r) return r; amdgpu_job_free(job); - dma_fence_put(*fence); - return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
[PATCH] drm/amdgpu: Get rid of amdgpu_job->external_hw_fence
This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (*sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzov...@amd.com/ Signed-off-by: Andrey Grodzovsky Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 27 -- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h| 1 - 3 files changed, 6 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 16faea7ed1cd..b79ee4ffb879 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5229,8 +5229,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * * job->base holds a reference to parent fence */ - if (job && (job->hw_fence.ops != NULL) && - dma_fence_is_signaled(>hw_fence)) { + if (job && dma_fence_is_signaled(>hw_fence)) { job_signaled = true; dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); goto skip_hw_reset; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 6fa381ee5fa0..10fdd12cf853 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -134,16 +134,10 @@ void amdgpu_job_free_resources(struct amdgpu_job *job) { struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); struct dma_fence *f; - struct dma_fence *hw_fence; unsigned i; - if (job->hw_fence.ops == NULL) - hw_fence = job->external_hw_fence; - else - hw_fence = >hw_fence; - /* use sched fence if available */ - f = job->base.s_fence ? >base.s_fence->finished : hw_fence; + f = job->base.s_fence ? >base.s_fence->finished : >hw_fence; for (i = 0; i < job->num_ibs; ++i) amdgpu_ib_free(ring->adev, >ibs[i], f); } @@ -157,11 +151,7 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) amdgpu_sync_free(>sync); amdgpu_sync_free(>sched_sync); -/* only put the hw fence if has embedded fence */ - if (job->hw_fence.ops != NULL) - dma_fence_put(>hw_fence); - else - kfree(job); + dma_fence_put(>hw_fence); } void amdgpu_job_free(struct amdgpu_job *job) @@ -170,11 +160,7 @@ void amdgpu_job_free(struct amdgpu_job *job) amdgpu_sync_free(>sync); amdgpu_sync_free(>sched_sync); - /* only put the hw fence if has embedded fence */ - if (job->hw_fence.ops != NULL) - dma_fence_put(>hw_fence); - else - kfree(job); + dma_fence_put(>hw_fence); } int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity, @@ -204,15 +190,12 @@ int amdgpu_job_submit_direct(struct amdgpu_job *job, struct amdgpu_ring *ring, int r; job->base.sched = >sched; - r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, NULL, fence); - /* record external_hw_fence for direct submit */ - job->external_hw_fence = dma_fence_get(*fence); + r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job, fence); + if (r)
[PATCH v2 2/4] drm/amdgpu: Prevent race between late signaled fences and GPU reset.
Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. v2: Switch from irq_get/put to full enable/disable_irq for amdgpu Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 18 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + 3 files changed, 23 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index eacecc672a4d..03519d58e630 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } + amdgpu_fence_driver_isr_toggle(adev, true); + /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_fence_driver_force_completion(ring); } + amdgpu_fence_driver_isr_toggle(adev, false); + if (job && job->vm) drm_sched_increase_karma(>base); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index a9ae3beaa1d3..c1d04ea3c67f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -532,6 +532,24 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev) } } +/* Will either stop and flush handlers for amdgpu interrupt or reanble it */ +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop) +{ + int i; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized || !ring->fence_drv.irq_src) + continue; + + if (stop) + disable_irq(adev->irq.irq); + else + enable_irq(adev->irq.irq); + } +} + void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) { unsigned int i, j; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..82c178a9033a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct amdgpu_ring *ring, uint32_t wait_seq, signed long timeout); unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring); +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop); /* * Rings. -- 2.25.1
[PATCH v2 4/4] drm/amdgpu: Follow up change to previous drm scheduler change.
Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 4 files changed, 29 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 44da025502ac..567597469a8a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, goto err_ib_sched; } + /* Drop the initial kref_init count (see drm_sched_main as example) */ + dma_fence_put(f); ret = dma_fence_wait(f, false); err_ib_sched: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 03519d58e630..a2c268d48edd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5009,16 +5009,32 @@ static void amdgpu_device_recheck_guilty_jobs( /* clear job's guilty and depend the folowing step to decide the real one */ drm_sched_reset_karma(s_job); - /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get -* to make sure fence is balanced */ - dma_fence_get(s_job->s_fence->parent); drm_sched_resubmit_jobs_ext(>sched, 1); + if (!s_job->s_fence->parent) { + DRM_WARN("Failed to get a HW fence for job!"); + continue; + } + ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); if (ret == 0) { /* timeout */ DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", ring->sched.name, s_job->id); + + amdgpu_fence_driver_isr_toggle(adev, true); + + /* Clear this failed job from fence array */ + amdgpu_fence_driver_clear_job_fences(ring); + + amdgpu_fence_driver_isr_toggle(adev, false); + + /* Since the job won't signal and we go for +* another resubmit drop this parent pointer +*/ + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; + /* set guilty */ drm_sched_increase_karma(s_job); retry: @@ -5047,7 +5063,6 @@ static void amdgpu_device_recheck_guilty_jobs( /* got the hw fence, signal finished fence */ atomic_dec(ring->sched.score); - dma_fence_put(s_job->s_fence->parent); dma_fence_get(_job->s_fence->finished); dma_fence_signal(_job->s_fence->finished); dma_fence_put(_job->s_fence->finished); @@ -5220,8 +5235,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * * job->base holds a reference to parent fence */ - if (job && job->base.s_fence->parent && - dma_fence_is_signaled(job->base.s_fence->parent)) { + if (job && (job->hw_fence.ops != NULL) && + dma_fence_is_signaled(>hw_fence)) { job_signaled = true; dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); goto skip_hw_reset; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index c1d04ea3c67f..39597ab807d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,11 +164,16 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + /* TO be inline with external fence creation and other drivers */ + dma_fence_get(fence); } else { - if (job) + if (job) {
[PATCH v2 1/4] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 957437a5558c..a9ae3beaa1d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -595,8 +595,10 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) { ptr = >fence_drv.fences[i]; old = rcu_dereference_protected(*ptr, 1); - if (old && old->ops == _job_fence_ops) + if (old && old->ops == _job_fence_ops) { RCU_INIT_POINTER(*ptr, NULL); + dma_fence_put(old); + } } } -- 2.25.1
[PATCH v2 3/4] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'
Problem: This patch caused negative refcount as described in [1] because for that case parent fence did not signal by the time of drm_sched_stop and hence kept in pending list the assumption was they will not signal and so fence was put to account for the s_fence->parent refcount but for amdgpu which has embedded HW fence (always same parent fence) drm_sched_fence_release_scheduled was always called and would still drop the count for parent fence once more. For jobs that never signaled this imbalance was masked by refcount bug in amdgpu_fence_driver_clear_job_fences that would not drop refcount on the fences that were removed from fence drive fences array (against prevois insertion into the array in get in amdgpu_fence_emit). Fix: Revert this patch and by setting s_job->s_fence->parent to NULL as before prevent the extra refcount drop in amdgpu when drm_sched_fence_release_scheduled is called on job release. Also - align behaviour in drm_sched_resubmit_jobs_ext with that of drm_sched_main when submitting jobs - take a refcount for the new parent fence pointer and drop refcount for original kref_init for new HW fence creation (or fake new HW fence in amdgpu - see next patch). [1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/scheduler/sched_main.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index b81fceb0b8a2..c5437ee03e3f 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -419,6 +419,8 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, _job->cb)) { + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; atomic_dec(>hw_rq_count); } else { /* @@ -548,7 +550,6 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler *sched, int max) if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) dma_fence_set_error(_fence->finished, -ECANCELED); - dma_fence_put(s_job->s_fence->parent); fence = sched->ops->run_job(s_job); i++; @@ -558,7 +559,11 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler *sched, int max) s_job->s_fence->parent = NULL; } else { - s_job->s_fence->parent = fence; + + s_job->s_fence->parent = dma_fence_get(fence); + + /* Drop for orignal kref_init */ + dma_fence_put(fence); } } } @@ -952,6 +957,9 @@ static int drm_sched_main(void *param) if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence); + /* Drop for original kref_init of the fence */ + dma_fence_put(fence); + r = dma_fence_add_callback(fence, _job->cb, drm_sched_job_done_cb); if (r == -ENOENT) @@ -959,7 +967,6 @@ static int drm_sched_main(void *param) else if (r) DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r); - dma_fence_put(fence); } else { if (IS_ERR(fence)) dma_fence_set_error(_fence->finished, PTR_ERR(fence)); -- 2.25.1
[PATCH v2 0/4] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.
Yiqing raised a problem of negative fence refcount for resubmitted jobs in amdgpu and suggested a workaround in [1]. I took a look myself and discovered some deeper problems both in amdgpu and scheduler code. Yiqing helped with testing the new code and also drew a detailed refcount and flow tracing diagram for parent (HW) fence life cycle and refcount under various cases for the proposed patchset at [2]. v2: Update race preventionby fixing by swithing from amdgpu_irq_get/put to enable/desable_irq (Christian) Drop refcount fix for amdgpu_job->external_hw_fence as it was causing underflow in direct submissions TODO - Follow up cleanup to totally get rid of amdgpu_job->external_hw_fence [1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 [2] - https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing Andrey Grodzovsky (4): drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences drm/amdgpu: Prevent race between late signaled fences and GPU reset. drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer' drm/amdgpu: Follow up change to previous drm scheduler change. drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 29 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + drivers/gpu/drm/scheduler/sched_main.c | 13 ++--- 6 files changed, 65 insertions(+), 15 deletions(-) -- 2.25.1
Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence
On 2022-06-22 11:04, Christian König wrote: Am 22.06.22 um 17:01 schrieb Andrey Grodzovsky: On 2022-06-22 05:00, Christian König wrote: Am 21.06.22 um 21:34 schrieb Andrey Grodzovsky: On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put -> refcount 1 Fix: Add put for external_hw_fence in amdgpu_job_free/free_cb Well what is the external_hw_fence good for in this construct? As far as I understand for direct submissions you don't want to pass a job pointer to ib_schedule and so u can't use the embedded fence for this case. Can you please look a bit deeper into this, we now have a couple of fields in the job structure which have no obvious use. I think we could pass a job structure to ib_schedule even for direct submit now. Are you sure ? I see a lot of activities in amdgpu_ib_schedule depend on presence of vm and fence_ctx which are set if the job pointer argument != NULL, might this have a negative impact on direct submit ? Not 100% sure, but we did tons of workarounds because we didn't had a job pointer for direct submit. But this was before we embedded the IBs at the end of the job. It's quite likely that this should be possible now, it's just that somebody needs to double check. Christian. Looking more i see stuff like amdgpu_vm_flush and amdgpu_ring_emit_cntxcntl, emit_frame_cntl that are conditioned on job argument, doesn't look to me like this is relevant to direct submit ? I also noticed that direct submit passes back the created fence to it's caller while freeing the job immediately, Using embedded job here will increase the time the job object will hang around the memory without any use as long as it's fence is referenced. Job object is much larger then a single fence. Andrey Andrey Regards, Christian. Andrey Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 10aa073600d4..58568fdde2d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { When one side of the if uses {} the other side should use {} as well, e.g. use } else { here. Christian. + dma_fence_put(job->external_hw_fence); kfree(job); + } } void amdgpu_job_free(struct amdgpu_job *job) @@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { + dma_fence_put(job->external_hw_fence); kfree(job); + } } int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.
On 2022-06-23 01:52, Christian König wrote: Am 22.06.22 um 19:19 schrieb Andrey Grodzovsky: On 2022-06-22 03:17, Christian König wrote: Am 21.06.22 um 22:00 schrieb Andrey Grodzovsky: On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Could we now also remove job_run_counter? Christian. I am afraid not, job counter is needed since at all times the refcount on the embedded fence cannot drop to zero because this will free the job itself before the end of it's life cycle. We have to be able to differentiate in amdgpu_fence_emit between first ever call where we init the embedded fence's refcount from scratch using kref_init to repeating calls when refcount already > 0 and we just fake increase the refcount to align behavior with pointer style fences in other drivers. Well what we should probably rather do is move the init out of emit instead. The only down side I can see is that the sequence number isn't know on initial init and so needs to be zero or something like that. Regards, Christian. Not sure how this help, the problem is not there but in amdgpu_job_run, for embedded fence and resubmit job in pending list amdgpu_job_run will be called twice or even 3 times with recheck guilty job sequence. I am supposed to do dma_fence_init to embeded HW fence only on first call while on second and third only update sequence_num and increase refcount. How can i differentiate between first and non first calls without job_run_counter ? Yeah, good point. We should really stop re-submitting jobs altogether in the kernel and move that whole functionality into userspace. Christian. So i guess we keep this for now and see how to move resubmit functionality to user space ? as a separate task ? Andrey Andrey I guess we could assume that embedded fence is all zeroes before first dma_fence_init if assuming the job itself was allocated using kzalloc and so u can look at dma_fence_ops == NULL or maybe seqno == 0 as a hint if that the fist call or not but it's a risky assumption in my opinion. Andrey Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 -- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 4 files changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 513c57f839d8..447bd92c4856 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, goto err_ib_sched; } + /* Drop the initial kref_init count (see drm_sched_main as example) */ + dma_fence_put(f); ret = dma_fence_wait(f, false); err_ib_sched: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c99541685804..f9718119834f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs( /* clear job's guilty and depend the folowing step to decide the real one */ drm_sched_reset_karma(s_job); - /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get - * to make sure fence is balanced */ - dma_fence_get(s_job->s_fence->parent); drm_sched_resubmit_jobs_ext(>sched, 1); + if (!s_job->s_fence->parent) { + DRM_WARN("Failed to get a HW fence for job!"); + continue; + } + ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); if (ret == 0) { /* timeout */ DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", ring->sched.name, s_job->id); + + /* Clear this failed job from fence array */ + amdgpu_fence_driver_clear_job_fences(ring); + + /* Since the job won't signal and we go for + * another resubmit drop this parent pointer + */ + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; + /* set guilty */ drm_
Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.
Just a ping Andrey On 2022-06-21 15:45, Andrey Grodzovsky wrote: On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b92281dd0c1..c99541685804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } + amdgpu_fence_driver_isr_toggle(adev, true); + /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_fence_driver_force_completion(ring); } + amdgpu_fence_driver_isr_toggle(adev, false); + if (job && job->vm) drm_sched_increase_karma(>base); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index a9ae3beaa1d3..d6d54ba4c185 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev) } } +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop) +{ + int i; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized || !ring->fence_drv.irq_src) + continue; + + if (stop) + amdgpu_irq_put(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); + else + amdgpu_irq_get(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); That won't work like this. This increments/decrements the reference count for the IRQ, but doesn't guarantee in any way that they are stopped/started. I understand that, i just assumed that the fence driver is the only holder of this interrupt source (e.g. regCP_INT_CNTL_RING0) ? I can disable amdgpu interrupt line totally using disable_irq - would this be better ? + } + + /* TODO Only waits for irq handlers on other CPUs, maybe local_irq_save + * local_irq_local_irq_restore are needed here for local interrupts ? + * + */ Well that comment made me smile. Think for a moment what the local CPU would be doing if an interrupt would run :) No, I understand this of course, I am ok to be interrupted by interrupt handler at this point, what i am trying to do is to prevent amdgpu_fence_process to run concurrently with amdgpu_fence_driver_clear_job_fences - that is what this function is trying to prevent - i disable and flush pending EOP ISR handlers before the call to clear fences and re-enable after. I guess we can also introduce a spinlock to serialize them ? Yiqing reported seeing a race between them so we have to do something. Andrey Cheers, Christian. + if (stop) + synchronize_irq(adev->irq.irq); +} + void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) { unsigned int i, j; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..82c178a9033a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct amdgpu_ring *ring, uint32_t wait_seq, signed long timeout); unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring); +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop); /* * Rings.
Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.
On 2022-06-22 03:17, Christian König wrote: Am 21.06.22 um 22:00 schrieb Andrey Grodzovsky: On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Could we now also remove job_run_counter? Christian. I am afraid not, job counter is needed since at all times the refcount on the embedded fence cannot drop to zero because this will free the job itself before the end of it's life cycle. We have to be able to differentiate in amdgpu_fence_emit between first ever call where we init the embedded fence's refcount from scratch using kref_init to repeating calls when refcount already > 0 and we just fake increase the refcount to align behavior with pointer style fences in other drivers. Well what we should probably rather do is move the init out of emit instead. The only down side I can see is that the sequence number isn't know on initial init and so needs to be zero or something like that. Regards, Christian. Not sure how this help, the problem is not there but in amdgpu_job_run, for embedded fence and resubmit job in pending list amdgpu_job_run will be called twice or even 3 times with recheck guilty job sequence. I am supposed to do dma_fence_init to embeded HW fence only on first call while on second and third only update sequence_num and increase refcount. How can i differentiate between first and non first calls without job_run_counter ? Andrey I guess we could assume that embedded fence is all zeroes before first dma_fence_init if assuming the job itself was allocated using kzalloc and so u can look at dma_fence_ops == NULL or maybe seqno == 0 as a hint if that the fist call or not but it's a risky assumption in my opinion. Andrey Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 -- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 4 files changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 513c57f839d8..447bd92c4856 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, goto err_ib_sched; } + /* Drop the initial kref_init count (see drm_sched_main as example) */ + dma_fence_put(f); ret = dma_fence_wait(f, false); err_ib_sched: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c99541685804..f9718119834f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs( /* clear job's guilty and depend the folowing step to decide the real one */ drm_sched_reset_karma(s_job); - /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get - * to make sure fence is balanced */ - dma_fence_get(s_job->s_fence->parent); drm_sched_resubmit_jobs_ext(>sched, 1); + if (!s_job->s_fence->parent) { + DRM_WARN("Failed to get a HW fence for job!"); + continue; + } + ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); if (ret == 0) { /* timeout */ DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", ring->sched.name, s_job->id); + + /* Clear this failed job from fence array */ + amdgpu_fence_driver_clear_job_fences(ring); + + /* Since the job won't signal and we go for + * another resubmit drop this parent pointer + */ + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; + /* set guilty */ drm_sched_increase_karma(s_job); retry: @@ -5047,7 +5059,6 @@ static void amdgpu_device_recheck_guilty_jobs( /* got the hw fence, signal finished fence */ atomic_dec(ring->sched.score); - dma_fence_put(s_job->s_fence->parent); dma_fence_get(_job->s_fence->finished); dma_fence_signal(_job->s_fence->finished); dma_fe
Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence
On 2022-06-22 05:00, Christian König wrote: Am 21.06.22 um 21:34 schrieb Andrey Grodzovsky: On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put -> refcount 1 Fix: Add put for external_hw_fence in amdgpu_job_free/free_cb Well what is the external_hw_fence good for in this construct? As far as I understand for direct submissions you don't want to pass a job pointer to ib_schedule and so u can't use the embedded fence for this case. Can you please look a bit deeper into this, we now have a couple of fields in the job structure which have no obvious use. I think we could pass a job structure to ib_schedule even for direct submit now. Are you sure ? I see a lot of activities in amdgpu_ib_schedule depend on presence of vm and fence_ctx which are set if the job pointer argument != NULL, might this have a negative impact on direct submit ? Andrey Regards, Christian. Andrey Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 10aa073600d4..58568fdde2d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { When one side of the if uses {} the other side should use {} as well, e.g. use } else { here. Christian. + dma_fence_put(job->external_hw_fence); kfree(job); + } } void amdgpu_job_free(struct amdgpu_job *job) @@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { + dma_fence_put(job->external_hw_fence); kfree(job); + } } int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.
You have a job in the pending list which is marked as not finished in drm_sched_stop (https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/scheduler/sched_main.c#L420), s_fence signal cb removed and the job is kept in pending list. Later you will try to manually clear the HW fence of this job in here (https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L4492) but the EOP interrupt can fire for that fence exactly at this moment and you have concurrent access to fence driver's fence array from both amdgpu_process_fence and amdgpu_fence_driver_clear_job_fences which is not supposed to happen.Yiqing reported to me a race during debugging we did of the original refcount bug and it looked to me like this scenario. Seems to me the EOP ISR handler should be prevented from running during this time at least. Andrey On 2022-06-21 21:47, VURDIGERENATARAJ, CHANDAN wrote: Hi, Is this a preventive fix or you found errors/oops/hangs? If you had found errors/oops/hangs, can you please share the details? BR, Chandan V N On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b92281dd0c1..c99541685804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } + amdgpu_fence_driver_isr_toggle(adev, true); + /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_fence_driver_force_completion(ring); } + amdgpu_fence_driver_isr_toggle(adev, false); + if (job && job->vm) drm_sched_increase_karma(>base); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index a9ae3beaa1d3..d6d54ba4c185 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev) } } +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop) +{ + int i; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized || !ring->fence_drv.irq_src) + continue; + + if (stop) + amdgpu_irq_put(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); + else + amdgpu_irq_get(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); That won't work like this. This increments/decrements the reference count for the IRQ, but doesn't guarantee in any way that they are stopped/started. I understand that, i just assumed that the fence driver is the only holder of this interrupt source (e.g. regCP_INT_CNTL_RING0) ? I can disable amdgpu interrupt line totally using disable_irq - would this be better ? + } + + /* TODO Only waits for irq handlers on other CPUs, maybe local_irq_save + * local_irq_local_irq_restore are needed here for local interrupts ? + * + */ Well that comment made me smile. Think for a moment what the local CPU would be doing if an interrupt would run :) No, I understand this of course, I am ok to be interrupted by interrupt handler at this point, what i am trying to do is to prevent amdgpu_fence_process to run concurrently with amdgpu_fence_driver_clear_job_fences - that is what this function is trying to prevent - i disable and >flush pending EOP ISR handlers before the call to clear fences and re-enable after. I guess we can also introduce a spinlock to serialize them ? Yiqing reported seeing a race between them so we have to do something. Andrey Cheers, Christian. + if (stop) + synchronize_irq(adev->irq.irq); } + void amdgpu_fence_driver_sw_fin
Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.
On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Could we now also remove job_run_counter? Christian. I am afraid not, job counter is needed since at all times the refcount on the embedded fence cannot drop to zero because this will free the job itself before the end of it's life cycle. We have to be able to differentiate in amdgpu_fence_emit between first ever call where we init the embedded fence's refcount from scratch using kref_init to repeating calls when refcount already > 0 and we just fake increase the refcount to align behavior with pointer style fences in other drivers. I guess we could assume that embedded fence is all zeroes before first dma_fence_init if assuming the job itself was allocated using kzalloc and so u can look at dma_fence_ops == NULL or maybe seqno == 0 as a hint if that the fist call or not but it's a risky assumption in my opinion. Andrey Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 -- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 4 files changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 513c57f839d8..447bd92c4856 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, goto err_ib_sched; } + /* Drop the initial kref_init count (see drm_sched_main as example) */ + dma_fence_put(f); ret = dma_fence_wait(f, false); err_ib_sched: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c99541685804..f9718119834f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs( /* clear job's guilty and depend the folowing step to decide the real one */ drm_sched_reset_karma(s_job); - /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get - * to make sure fence is balanced */ - dma_fence_get(s_job->s_fence->parent); drm_sched_resubmit_jobs_ext(>sched, 1); + if (!s_job->s_fence->parent) { + DRM_WARN("Failed to get a HW fence for job!"); + continue; + } + ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); if (ret == 0) { /* timeout */ DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", ring->sched.name, s_job->id); + + /* Clear this failed job from fence array */ + amdgpu_fence_driver_clear_job_fences(ring); + + /* Since the job won't signal and we go for + * another resubmit drop this parent pointer + */ + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; + /* set guilty */ drm_sched_increase_karma(s_job); retry: @@ -5047,7 +5059,6 @@ static void amdgpu_device_recheck_guilty_jobs( /* got the hw fence, signal finished fence */ atomic_dec(ring->sched.score); - dma_fence_put(s_job->s_fence->parent); dma_fence_get(_job->s_fence->finished); dma_fence_signal(_job->s_fence->finished); dma_fence_put(_job->s_fence->finished); @@ -5220,8 +5231,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * * job->base holds a reference to parent fence */ - if (job && job->base.s_fence->parent && - dma_fence_is_signaled(job->base.s_fence->parent)) { + if (job && (job->hw_fence.ops != NULL) && + dma_fence_is_signaled(>hw_fence)) { job_signaled = true; dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); goto skip_hw_reset; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d6d54ba4c185..9bd4e18212fc 100644 --- a/drivers/gpu/drm/a
Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.
On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b92281dd0c1..c99541685804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } + amdgpu_fence_driver_isr_toggle(adev, true); + /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_fence_driver_force_completion(ring); } + amdgpu_fence_driver_isr_toggle(adev, false); + if (job && job->vm) drm_sched_increase_karma(>base); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index a9ae3beaa1d3..d6d54ba4c185 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev) } } +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop) +{ + int i; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized || !ring->fence_drv.irq_src) + continue; + + if (stop) + amdgpu_irq_put(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); + else + amdgpu_irq_get(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); That won't work like this. This increments/decrements the reference count for the IRQ, but doesn't guarantee in any way that they are stopped/started. I understand that, i just assumed that the fence driver is the only holder of this interrupt source (e.g. regCP_INT_CNTL_RING0) ? I can disable amdgpu interrupt line totally using disable_irq - would this be better ? + } + + /* TODO Only waits for irq handlers on other CPUs, maybe local_irq_save + * local_irq_local_irq_restore are needed here for local interrupts ? + * + */ Well that comment made me smile. Think for a moment what the local CPU would be doing if an interrupt would run :) No, I understand this of course, I am ok to be interrupted by interrupt handler at this point, what i am trying to do is to prevent amdgpu_fence_process to run concurrently with amdgpu_fence_driver_clear_job_fences - that is what this function is trying to prevent - i disable and flush pending EOP ISR handlers before the call to clear fences and re-enable after. I guess we can also introduce a spinlock to serialize them ? Yiqing reported seeing a race between them so we have to do something. Andrey Cheers, Christian. + if (stop) + synchronize_irq(adev->irq.irq); +} + void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) { unsigned int i, j; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..82c178a9033a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct amdgpu_ring *ring, uint32_t wait_seq, signed long timeout); unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring); +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop); /* * Rings.
Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence
On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put -> refcount 1 Fix: Add put for external_hw_fence in amdgpu_job_free/free_cb Well what is the external_hw_fence good for in this construct? As far as I understand for direct submissions you don't want to pass a job pointer to ib_schedule and so u can't use the embedded fence for this case. Andrey Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 10aa073600d4..58568fdde2d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { When one side of the if uses {} the other side should use {} as well, e.g. use } else { here. Christian. + dma_fence_put(job->external_hw_fence); kfree(job); + } } void amdgpu_job_free(struct amdgpu_job *job) @@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { + dma_fence_put(job->external_hw_fence); kfree(job); + } } int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
[PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.
Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 -- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 4 files changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 513c57f839d8..447bd92c4856 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -684,6 +684,8 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, goto err_ib_sched; } + /* Drop the initial kref_init count (see drm_sched_main as example) */ + dma_fence_put(f); ret = dma_fence_wait(f, false); err_ib_sched: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c99541685804..f9718119834f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5009,16 +5009,28 @@ static void amdgpu_device_recheck_guilty_jobs( /* clear job's guilty and depend the folowing step to decide the real one */ drm_sched_reset_karma(s_job); - /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get -* to make sure fence is balanced */ - dma_fence_get(s_job->s_fence->parent); drm_sched_resubmit_jobs_ext(>sched, 1); + if (!s_job->s_fence->parent) { + DRM_WARN("Failed to get a HW fence for job!"); + continue; + } + ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); if (ret == 0) { /* timeout */ DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", ring->sched.name, s_job->id); + + /* Clear this failed job from fence array */ + amdgpu_fence_driver_clear_job_fences(ring); + + /* Since the job won't signal and we go for +* another resubmit drop this parent pointer +*/ + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; + /* set guilty */ drm_sched_increase_karma(s_job); retry: @@ -5047,7 +5059,6 @@ static void amdgpu_device_recheck_guilty_jobs( /* got the hw fence, signal finished fence */ atomic_dec(ring->sched.score); - dma_fence_put(s_job->s_fence->parent); dma_fence_get(_job->s_fence->finished); dma_fence_signal(_job->s_fence->finished); dma_fence_put(_job->s_fence->finished); @@ -5220,8 +5231,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * * job->base holds a reference to parent fence */ - if (job && job->base.s_fence->parent && - dma_fence_is_signaled(job->base.s_fence->parent)) { + if (job && (job->hw_fence.ops != NULL) && + dma_fence_is_signaled(>hw_fence)) { job_signaled = true; dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); goto skip_hw_reset; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d6d54ba4c185..9bd4e18212fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,11 +164,16 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + /* TO be inline with external fence creation and other drivers */ + dma_fence_get(fence); } else { - if (job) + if (job) { dma_fence_init(fence, _job_fence_ops, >fence_drv.lock,
[PATCH 4/5] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'
Problem: This patch caused negative refcount as described in [1] because for that case parent fence did not signal by the time of drm_sched_stop and hence kept in pending list the assumption was they will not signal and so fence was put to account for the s_fence->parent refcount but for amdgpu which has embedded HW fence (always same parent fence) drm_sched_fence_release_scheduled was always called and would still drop the count for parent fence once more. For jobs that never signaled this imbalance was masked by refcount bug in amdgpu_fence_driver_clear_job_fences that would not drop refcount on the fences that were removed from fence drive fences array (against prevois insertion into the array in get in amdgpu_fence_emit). Fix: Revert this patch and by setting s_job->s_fence->parent to NULL as before prevent the extra refcount drop in amdgpu when drm_sched_fence_release_scheduled is called on job release. Also - align behaviour in drm_sched_resubmit_jobs_ext with that of drm_sched_main when submitting jobs - take a refcount for the new parent fence pointer and drop refcount for original kref_init for new HW fence creation (or fake new HW fence in amdgpu - see next patch). [1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/scheduler/sched_main.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index b81fceb0b8a2..b38394f5694f 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -419,6 +419,11 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, _job->cb)) { + /* Revert drm/sched: Keep s_fence->parent pointer, no +* need anymore for amdgpu and creates only troubles +*/ + dma_fence_put(s_job->s_fence->parent); + s_job->s_fence->parent = NULL; atomic_dec(>hw_rq_count); } else { /* @@ -548,7 +553,6 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler *sched, int max) if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) dma_fence_set_error(_fence->finished, -ECANCELED); - dma_fence_put(s_job->s_fence->parent); fence = sched->ops->run_job(s_job); i++; @@ -558,7 +562,11 @@ void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler *sched, int max) s_job->s_fence->parent = NULL; } else { - s_job->s_fence->parent = fence; + + s_job->s_fence->parent = dma_fence_get(fence); + + /* Drop for orignal kref_init */ + dma_fence_put(fence); } } } @@ -952,6 +960,9 @@ static int drm_sched_main(void *param) if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence); + /* Drop for original kref_init of the fence */ + dma_fence_put(fence); + r = dma_fence_add_callback(fence, _job->cb, drm_sched_job_done_cb); if (r == -ENOENT) @@ -959,7 +970,6 @@ static int drm_sched_main(void *param) else if (r) DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r); - dma_fence_put(fence); } else { if (IS_ERR(fence)) dma_fence_set_error(_fence->finished, PTR_ERR(fence)); -- 2.25.1
[PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.
Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b92281dd0c1..c99541685804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4605,6 +4605,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_virt_fini_data_exchange(adev); } + amdgpu_fence_driver_isr_toggle(adev, true); + /* block all schedulers and reset given job's ring */ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -4620,6 +4622,8 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, amdgpu_fence_driver_force_completion(ring); } + amdgpu_fence_driver_isr_toggle(adev, false); + if (job && job->vm) drm_sched_increase_karma(>base); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index a9ae3beaa1d3..d6d54ba4c185 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -532,6 +532,32 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev) } } +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop) +{ + int i; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized || !ring->fence_drv.irq_src) + continue; + + if (stop) + amdgpu_irq_put(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); + else + amdgpu_irq_get(adev, ring->fence_drv.irq_src, + ring->fence_drv.irq_type); + } + + /* TODO Only waits for irq handlers on other CPUs, maybe local_irq_save +* local_irq_local_irq_restore are needed here for local interrupts ? +* +*/ + if (stop) + synchronize_irq(adev->irq.irq); +} + void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) { unsigned int i, j; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7d89a52091c0..82c178a9033a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -143,6 +143,7 @@ signed long amdgpu_fence_wait_polling(struct amdgpu_ring *ring, uint32_t wait_seq, signed long timeout); unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring); +void amdgpu_fence_driver_isr_toggle(struct amdgpu_device *adev, bool stop); /* * Rings. -- 2.25.1
[PATCH 2/5] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 957437a5558c..a9ae3beaa1d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -595,8 +595,10 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) { ptr = >fence_drv.fences[i]; old = rcu_dereference_protected(*ptr, 1); - if (old && old->ops == _job_fence_ops) + if (old && old->ops == _job_fence_ops) { RCU_INIT_POINTER(*ptr, NULL); + dma_fence_put(old); + } } } -- 2.25.1
[PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence
Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put -> refcount 1 Fix: Add put for external_hw_fence in amdgpu_job_free/free_cb Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 10aa073600d4..58568fdde2d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -152,8 +152,10 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { + dma_fence_put(job->external_hw_fence); kfree(job); + } } void amdgpu_job_free(struct amdgpu_job *job) @@ -165,8 +167,10 @@ void amdgpu_job_free(struct amdgpu_job *job) /* only put the hw fence if has embedded fence */ if (job->hw_fence.ops != NULL) dma_fence_put(>hw_fence); - else + else { + dma_fence_put(job->external_hw_fence); kfree(job); + } } int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity, -- 2.25.1
[PATCH 0/5] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.
Yiqing raised a problem of negative fence refcount for resubmitted jobs in amdgpu and suggested a workaround in [1]. I took a look myself and discovered some deeper problems both in amdgpu and scheduler code. Yiqing helped with testing the new code and also drew a detailed refcount and flow tracing diagram for parent (HW) fence life cycle and refcount under various cases for the proposed patchset at [2]. [1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 [2] - https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing Andrey Grodzovsky (5): drm/amdgpu: Fix possible refcount leak for release of external_hw_fence drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences drm/amdgpu: Prevent race between late signaled fences and GPU reset. drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer' drm/amdgpu: Follow up change to previous drm scheduler change. drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 37 -- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 12 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + drivers/gpu/drm/scheduler/sched_main.c | 16 -- 6 files changed, 78 insertions(+), 17 deletions(-) -- 2.25.1
Re: [PATCH] drm/amdgpu: fix refcount underflow in device reset
On 2022-06-06 03:43, Yiqing Yao wrote: [why] A gfx job may be processed but not finished when reset begin from compute job timeout. drm_sched_resubmit_jobs_ext in sched_main assume submitted job unsignaled and always put parent fence. Resubmission for that job cause underflow. This fix is done in device reset to avoid changing drm sched_main. Are we talking about amdgpu_fence_process sneaking in here just before you call drm_sched_resubmit_jobs_ext->dma_fence_put(parent) and doing extra put ? How about first remove the fence in question from >fences like it's done in [how] Check if the job to submit has signaled and avoid submission if signaled in device reset for both advanced TDR and normal job resume. If what i said above is the problem then this is a racy solution no ? The fence can signal right after your check anyway. How about first removing this fence from drv->fences array and only then checking if it's signaled or not. If signaled you can just skip resubmission and otherwise you can go ahead and call resubmit and not worry about double put. Andrey Signed-off-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 72 -- 1 file changed, 41 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f16f105a737b..29b307af97eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4980,39 +4980,43 @@ static void amdgpu_device_recheck_guilty_jobs( /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get * to make sure fence is balanced */ dma_fence_get(s_job->s_fence->parent); - drm_sched_resubmit_jobs_ext(>sched, 1); - ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); - if (ret == 0) { /* timeout */ - DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", - ring->sched.name, s_job->id); + /* avoid submission for signaled hw fence */ + if(!dma_fence_is_signaled(s_job->s_fence->parent)){ - /* set guilty */ - drm_sched_increase_karma(s_job); + drm_sched_resubmit_jobs_ext(>sched, 1); + + ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); + if (ret == 0) { /* timeout */ + DRM_ERROR("Found the real bad job! ring:%s, job_id:%llx\n", + ring->sched.name, s_job->id); + + /* set guilty */ + drm_sched_increase_karma(s_job); retry: - /* do hw reset */ - if (amdgpu_sriov_vf(adev)) { - amdgpu_virt_fini_data_exchange(adev); - r = amdgpu_device_reset_sriov(adev, false); - if (r) - adev->asic_reset_res = r; - } else { - clear_bit(AMDGPU_SKIP_HW_RESET, - _context->flags); - r = amdgpu_do_asic_reset(device_list_handle, -reset_context); - if (r && r == -EAGAIN) - goto retry; - } + /* do hw reset */ + if (amdgpu_sriov_vf(adev)) { + amdgpu_virt_fini_data_exchange(adev); + r = amdgpu_device_reset_sriov(adev, false); + if (r) + adev->asic_reset_res = r; + } else { + clear_bit(AMDGPU_SKIP_HW_RESET, + _context->flags); + r = amdgpu_do_asic_reset(device_list_handle, + reset_context); + if (r && r == -EAGAIN) + goto retry; + } - /* -* add reset counter so that the following -* resubmitted job could flush vmid -*/ - atomic_inc(>gpu_reset_counter); - continue; + /* + * add reset counter so that the following + * resubmitted job could flush vmid +
Re: [PATCH v3 4/7] drm/amdgpu: Add work_struct for GPU reset from debugfs
+ Monk On 2022-05-30 03:52, Christian König wrote: Am 25.05.22 um 21:04 schrieb Andrey Grodzovsky: We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 76df583663c7..8165ee5b0457 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1048,6 +1048,8 @@ struct amdgpu_device { bool scpm_enabled; uint32_t scpm_status; + + struct work_struct reset_work; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d16c8c1f72db..b0498ffcf7c3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -39,6 +39,7 @@ #include #include "amdgpu.h" #include "amdgpu_trace.h" +#include "amdgpu_reset.h" /* * Fences @@ -798,7 +799,10 @@ static int gpu_recover_get(void *data, u64 *val) return 0; } - *val = amdgpu_device_gpu_recover(adev, NULL); + if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work)) + flush_work(>reset_work); + + *val = atomic_read(>reset_domain->reset_res); pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); @@ -810,6 +814,14 @@ DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, "%lld\n"); +static void amdgpu_debugfs_reset_work(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + reset_work); + + amdgpu_device_gpu_recover_imp(adev, NULL); +} + #endif void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) @@ -821,9 +833,12 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) debugfs_create_file("amdgpu_fence_info", 0444, root, adev, _debugfs_fence_info_fops); - if (!amdgpu_sriov_vf(adev)) + if (!amdgpu_sriov_vf(adev)) { I think we should drop the check for amdgpu_sriov_vf() here. It's a valid requirement to be able to trigger a GPU reset for a VF as well. But not topic of this patch, feel free to add an Reviewed-by: Christian König . Regards, Christian. Monk - any idea why we prevent from creation of debugfs GPU reset for VF ? Andrey + + INIT_WORK(>reset_work, amdgpu_debugfs_reset_work); debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev, _debugfs_gpu_recover_fops); + } #endif }
[PATCH v3 6/7] drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover
We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +- 9 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 8165ee5b0457..664ed0a6deab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1244,7 +1244,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); -int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index a23abc0e86e7..513c57f839d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -129,7 +129,7 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, kfd.reset_work); - amdgpu_device_gpu_recover_imp(adev, NULL); + amdgpu_device_gpu_recover(adev, NULL); } void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e3e2a5d17cc2..424571e46cf5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5065,7 +5065,7 @@ static void amdgpu_device_recheck_guilty_jobs( * Returns 0 for success or an error on failure. */ -int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index b0498ffcf7c3..957437a5558c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -819,7 +819,7 @@ static void amdgpu_debugfs_reset_work(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, reset_work); - amdgpu_device_gpu_recover_imp(adev, NULL); + amdgpu_device_gpu_recover(adev, NULL); } #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index dfe7f2b8f0aa..10aa073600d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -64,7 +64,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - r = amdgpu_device_gpu_recover_imp(ring->adev, job); + r = amdgpu_device_gpu_recover(ring->adev, job); if (r) DRM_ERROR("GPU Recovery Failed: %d\n", r); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index a439c04223b5..bc0049308207 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1922,7 +1922,7 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) } if (amdgpu_device_should_recover_gpu(ras->adev)) - amdgpu_device_gpu_recover_imp(ras->adev, NULL); + amdgpu_device_gpu_recover(ras->adev, NULL); atomic_set(>in_recovery, 0); } diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index b81acf59870c..7ec5b5cf4bb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -284,7 +284,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || ade
[PATCH v3 7/7] drm/amdgpu: Stop any pending reset if another in progress.
We skip rest requests if another one is already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++ 1 file changed, 27 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 424571e46cf5..e1f7ee604ea4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5054,6 +5054,27 @@ static void amdgpu_device_recheck_guilty_jobs( } } +static inline void amdggpu_device_stop_pedning_resets(struct amdgpu_device* adev) +{ + struct amdgpu_ras *con = amdgpu_ras_get_context(adev); + +#if defined(CONFIG_DEBUG_FS) + if (!amdgpu_sriov_vf(adev)) + cancel_work(>reset_work); +#endif + + if (adev->kfd.dev) + cancel_work(>kfd.reset_work); + + if (amdgpu_sriov_vf(adev)) + cancel_work(>virt.flr_work); + + if (con && adev->ras_enabled) + cancel_work(>recovery_work); + +} + + /** * amdgpu_device_gpu_recover - reset the asic and recover scheduler * @@ -5209,6 +5230,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, r, adev_to_drm(tmp_adev)->unique); tmp_adev->asic_reset_res = r; } + + /* +* Drop all pending non scheduler resets. Scheduler resets +* were already dropped during drm_sched_stop +*/ + amdggpu_device_stop_pedning_resets(tmp_adev); } tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter)); -- 2.25.1
[PATCH v3 5/7] drm/amdgpu: Add work_struct for GPU reset from kfd.
We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 -- 3 files changed, 15 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 1f8161cd507f..a23abc0e86e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -33,6 +33,7 @@ #include #include "amdgpu_ras.h" #include "amdgpu_umc.h" +#include "amdgpu_reset.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -122,6 +123,15 @@ static void amdgpu_doorbell_get_kfd_info(struct amdgpu_device *adev, } } + +static void amdgpu_amdkfd_reset_work(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + kfd.reset_work); + + amdgpu_device_gpu_recover_imp(adev, NULL); +} + void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) { int i; @@ -180,6 +190,8 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev, adev_to_drm(adev), _resources); + + INIT_WORK(>kfd.reset_work, amdgpu_amdkfd_reset_work); } } @@ -247,7 +259,8 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev) void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev) { if (amdgpu_device_should_recover_gpu(adev)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_reset_domain_schedule(adev->reset_domain, +>kfd.reset_work); } int amdgpu_amdkfd_alloc_gtt_mem(struct amdgpu_device *adev, size_t size, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index f8b9f27adcf5..e0709af5a326 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -96,6 +96,7 @@ struct amdgpu_kfd_dev { struct kfd_dev *dev; uint64_t vram_used; bool init_complete; + struct work_struct reset_work; }; enum kgd_engine_type { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index bfdd8883089a..e3e2a5d17cc2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5312,37 +5312,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, return r; } -struct amdgpu_recover_work_struct { - struct work_struct base; - struct amdgpu_device *adev; - struct amdgpu_job *job; - int ret; -}; - -static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) -{ - struct amdgpu_recover_work_struct *recover_work = container_of(work, struct amdgpu_recover_work_struct, base); - - amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); -} -/* - * Serialize gpu recover into reset domain single threaded wq - */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, - struct amdgpu_job *job) -{ - struct amdgpu_recover_work_struct work = {.adev = adev, .job = job}; - - INIT_WORK(, amdgpu_device_queue_gpu_recover_work); - - if (!amdgpu_reset_domain_schedule(adev->reset_domain, )) - return -EAGAIN; - - flush_work(); - - return atomic_read(>reset_domain->reset_res); -} - /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * -- 2.25.1
[PATCH v3 4/7] drm/amdgpu: Add work_struct for GPU reset from debugfs
We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 76df583663c7..8165ee5b0457 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1048,6 +1048,8 @@ struct amdgpu_device { boolscpm_enabled; uint32_tscpm_status; + + struct work_struct reset_work; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d16c8c1f72db..b0498ffcf7c3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -39,6 +39,7 @@ #include #include "amdgpu.h" #include "amdgpu_trace.h" +#include "amdgpu_reset.h" /* * Fences @@ -798,7 +799,10 @@ static int gpu_recover_get(void *data, u64 *val) return 0; } - *val = amdgpu_device_gpu_recover(adev, NULL); + if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work)) + flush_work(>reset_work); + + *val = atomic_read(>reset_domain->reset_res); pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); @@ -810,6 +814,14 @@ DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, "%lld\n"); +static void amdgpu_debugfs_reset_work(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + reset_work); + + amdgpu_device_gpu_recover_imp(adev, NULL); +} + #endif void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) @@ -821,9 +833,12 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) debugfs_create_file("amdgpu_fence_info", 0444, root, adev, _debugfs_fence_info_fops); - if (!amdgpu_sriov_vf(adev)) + if (!amdgpu_sriov_vf(adev)) { + + INIT_WORK(>reset_work, amdgpu_debugfs_reset_work); debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev, _debugfs_gpu_recover_fops); + } #endif } -- 2.25.1
[PATCH v3 2/7] drm/amdgpu: Cache result of last reset at reset domain level.
Will be read by executors of async reset like debugfs. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 4daa0e893965..bfdd8883089a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5307,6 +5307,8 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, if (r) dev_info(adev->dev, "GPU reset end with ret = %d\n", r); + + atomic_set(>reset_domain->reset_res, r); return r; } @@ -5321,7 +5323,7 @@ static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) { struct amdgpu_recover_work_struct *recover_work = container_of(work, struct amdgpu_recover_work_struct, base); - recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); + amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); } /* * Serialize gpu recover into reset domain single threaded wq @@ -5338,7 +5340,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, flush_work(); - return work.ret; + return atomic_read(>reset_domain->reset_res); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index c80af0889773..32c86a0b145c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -132,6 +132,7 @@ struct amdgpu_reset_domain *amdgpu_reset_create_reset_domain(enum amdgpu_reset_d } atomic_set(_domain->in_gpu_reset, 0); + atomic_set(_domain->reset_res, 0); init_rwsem(_domain->sem); return reset_domain; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h index 1949dbe28a86..9e55a5d7a825 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -82,6 +82,7 @@ struct amdgpu_reset_domain { enum amdgpu_reset_domain_type type; struct rw_semaphore sem; atomic_t in_gpu_reset; + atomic_t reset_res; }; -- 2.25.1
[PATCH v3 3/7] drm/admgpu: Serialize RAS recovery work directly into reset domain queue.
Save the extra usless work schedule. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 31207f7eec02..a439c04223b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -35,6 +35,8 @@ #include "amdgpu_xgmi.h" #include "ivsrcid/nbio/irqsrcs_nbif_7_4.h" #include "atom.h" +#include "amdgpu_reset.h" + #ifdef CONFIG_X86_MCE_AMD #include @@ -1920,7 +1922,7 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) } if (amdgpu_device_should_recover_gpu(ras->adev)) - amdgpu_device_gpu_recover(ras->adev, NULL); + amdgpu_device_gpu_recover_imp(ras->adev, NULL); atomic_set(>in_recovery, 0); } @@ -2928,7 +2930,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev) struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); if (atomic_cmpxchg(>in_recovery, 0, 1) == 0) - schedule_work(>recovery_work); + amdgpu_reset_domain_schedule(ras->adev->reset_domain, >recovery_work); return 0; } -- 2.25.1
[PATCH v3 1/7] Revert "workqueue: remove unused cancel_work()"
This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f. amdpgu need this function in order to prematurly stop pending reset works when another reset work already in progress. Signed-off-by: Andrey Grodzovsky Reviewed-by: Lai Jiangshan Reviewed-by: Christian König --- include/linux/workqueue.h | 1 + kernel/workqueue.c| 9 + 2 files changed, 10 insertions(+) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..9e41e1226193 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func); int execute_in_process_context(work_func_t fn, struct execute_work *); extern bool flush_work(struct work_struct *work); +extern bool cancel_work(struct work_struct *work); extern bool cancel_work_sync(struct work_struct *work); extern bool flush_delayed_work(struct delayed_work *dwork); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 613917bbc4e7..f94b596ebffd 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool is_dwork) return ret; } +/* + * See cancel_delayed_work() + */ +bool cancel_work(struct work_struct *work) +{ + return __cancel_work(work, false); +} +EXPORT_SYMBOL(cancel_work); + /** * cancel_delayed_work - cancel a delayed work * @dwork: delayed_work to cancel -- 2.25.1
[PATCH v3 0/7] Fix multiple GPU resets in XGMI hive.
Problem: During hive reset caused by command timing out on a ring extra resets are generated by triggered by KFD which is unable to accesses registers on the resetting ASIC. Fix: Rework GPU reset to actively stop any pending reset works while another in progress. v2: Switch from generic list as was in v1[1] to eplicit stopping of each reset request from each reset source per each request submitter. v3: Switch back to work_struct from delayed_work (Christian) [1] - https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzov...@amd.com/ Andrey Grodzovsky (7): Revert "workqueue: remove unused cancel_work()" drm/amdgpu: Cache result of last reset at reset domain level. drm/admgpu: Serialize RAS recovery work directly into reset domain queue. drm/amdgpu: Add work_struct for GPU reset from debugfs drm/amdgpu: Add work_struct for GPU reset from kfd. drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover drm/amdgpu: Stop any pending reset if another in progress. drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +- include/linux/workqueue.h | 1 + kernel/workqueue.c | 9 14 files changed, 84 insertions(+), 41 deletions(-) -- 2.25.1
Re: [PATCH] Revert "workqueue: remove unused cancel_work()"
On 2022-05-20 03:52, Tejun Heo wrote: On Fri, May 20, 2022 at 08:22:39AM +0200, Christian König wrote: Am 20.05.22 um 02:47 schrieb Lai Jiangshan: On Thu, May 19, 2022 at 11:04 PM Andrey Grodzovsky wrote: See this patch-set https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg78514.htmldata=05%7C01%7Candrey.grodzovsky%40amd.com%7Cb25896b7e8b14e605a8d08da3a35a7c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637886299388464620%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=TRabWQQrhy6nwkLfuXI4A%2FOcF9f%2FtFKdxIRfGc8Das4%3Dreserved=0, specifically patch 'drm/amdgpu: Switch to delayed work from work_struct. I will just reiterate here - We need to be able to do non blocking cancel pending reset works from within GPU reset. Currently kernel API allows this only for delayed_work and not for work_struct. I'm OK with the change. With an updated changelog: Reviewed-by: Lai Jiangshan Good morning guys, for the patch itself Reviewed-by: Christian König And just for the record: We plan to push this upstream through the drm branches, if anybody has any objections to that please speak up. Andrey, care to resend with updated description? Thanks Just adding here as attachment since only description changed changed. AndreyFrom 78df30cc97f10c885f5159a293e6afe2348aa60c Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky Date: Thu, 19 May 2022 09:47:28 -0400 Subject: Revert "workqueue: remove unused cancel_work()" This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f. amdpgu need this function in order to prematurly stop pending reset works when another reset work already in progress. Signed-off-by: Andrey Grodzovsky --- include/linux/workqueue.h | 1 + kernel/workqueue.c| 9 + 2 files changed, 10 insertions(+) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..9e41e1226193 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func); int execute_in_process_context(work_func_t fn, struct execute_work *); extern bool flush_work(struct work_struct *work); +extern bool cancel_work(struct work_struct *work); extern bool cancel_work_sync(struct work_struct *work); extern bool flush_delayed_work(struct delayed_work *dwork); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 613917bbc4e7..f94b596ebffd 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool is_dwork) return ret; } +/* + * See cancel_delayed_work() + */ +bool cancel_work(struct work_struct *work) +{ + return __cancel_work(work, false); +} +EXPORT_SYMBOL(cancel_work); + /** * cancel_delayed_work - cancel a delayed work * @dwork: delayed_work to cancel -- 2.25.1
Re: [PATCH] Revert "workqueue: remove unused cancel_work()"
See this patch-set https://www.spinics.net/lists/amd-gfx/msg78514.html, specifically patch 'drm/amdgpu: Switch to delayed work from work_struct. I will just reiterate here - We need to be able to do non blocking cancel pending reset works from within GPU reset. Currently kernel API allows this only for delayed_work and not for work_struct. Andrey On 2022-05-19 10:52, Lai Jiangshan wrote: On Thu, May 19, 2022 at 9:57 PM Andrey Grodzovsky wrote: This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f and exports the function. We need this funtion in amdgpu driver to fix a bug. Hello, Could you specify the reason why it is needed in amdgpu driver rather than "fix a bug", please. And there is a typo: "funtion". And please avoid using "we" in the changelog. For example, the sentence can be changed to: The amdgpu driver needs this function to cancel a work item in blabla context/situation or for blabla reason. (I'm not good at Engish, this is just an example of not using "we". No need to use the sentence.) Thanks Lai Signed-off-by: Andrey Grodzovsky --- include/linux/workqueue.h | 1 + kernel/workqueue.c| 9 + 2 files changed, 10 insertions(+) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..9e41e1226193 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func); int execute_in_process_context(work_func_t fn, struct execute_work *); extern bool flush_work(struct work_struct *work); +extern bool cancel_work(struct work_struct *work); extern bool cancel_work_sync(struct work_struct *work); extern bool flush_delayed_work(struct delayed_work *dwork); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 613917bbc4e7..f94b596ebffd 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool is_dwork) return ret; } +/* + * See cancel_delayed_work() + */ +bool cancel_work(struct work_struct *work) +{ + return __cancel_work(work, false); +} +EXPORT_SYMBOL(cancel_work); + /** * cancel_delayed_work - cancel a delayed work * @dwork: delayed_work to cancel -- 2.25.1
[PATCH] Revert "workqueue: remove unused cancel_work()"
This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f and exports the function. We need this funtion in amdgpu driver to fix a bug. Signed-off-by: Andrey Grodzovsky --- include/linux/workqueue.h | 1 + kernel/workqueue.c| 9 + 2 files changed, 10 insertions(+) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..9e41e1226193 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func); int execute_in_process_context(work_func_t fn, struct execute_work *); extern bool flush_work(struct work_struct *work); +extern bool cancel_work(struct work_struct *work); extern bool cancel_work_sync(struct work_struct *work); extern bool flush_delayed_work(struct delayed_work *dwork); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 613917bbc4e7..f94b596ebffd 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3267,6 +3267,15 @@ static bool __cancel_work(struct work_struct *work, bool is_dwork) return ret; } +/* + * See cancel_delayed_work() + */ +bool cancel_work(struct work_struct *work) +{ + return __cancel_work(work, false); +} +EXPORT_SYMBOL(cancel_work); + /** * cancel_delayed_work - cancel a delayed work * @dwork: delayed_work to cancel -- 2.25.1
Re: [PATCH v2 0/7] Fix multiple GPU resets in XGMI hive.
On 2022-05-19 03:58, Christian König wrote: Am 18.05.22 um 16:24 schrieb Andrey Grodzovsky: On 2022-05-18 02:07, Christian König wrote: Am 17.05.22 um 21:20 schrieb Andrey Grodzovsky: Problem: During hive reset caused by command timing out on a ring extra resets are generated by triggered by KFD which is unable to accesses registers on the resetting ASIC. Fix: Rework GPU reset to actively stop any pending reset works while another in progress. v2: Switch from generic list as was in v1[1] to eplicit stopping of each reset request from each reset source per each request submitter. Looks mostly good to me. Apart from the naming nit pick on patch #1 the only thing I couldn't of hand figure out is why you are using a delayed work everywhere instead of a just a work item. That needs a bit further explanation what's happening here. Christian. Check APIs for cancelling work vs. delayed work - For work_struct the only public API is this - https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3214 - blocking cancel. For delayed_work we have both blocking and non blocking public APIs - https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295 https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295 I prefer not to go now into convincing core kernel people of exposing another interface for our own sake - from my past experience API changes in core code has slim chances and a lot of time spent on back and forth arguments. "If the mountain will not come to Muhammad, then Muhammad must go to the mountain" ;)* * Ah, good point. The cancel_work() function was removed a few years ago: commit 6417250d3f894e66a68ba1cd93676143f2376a6f Author: Stephen Hemminger Date: Tue Mar 6 19:34:42 2018 -0800 workqueue: remove unused cancel_work() Found this by accident. There are no usages of bare cancel_work() in current kernel source. Signed-off-by: Stephen Hemminger Signed-off-by: Tejun Heo Maybe just revert that patch, export the function and use it. I think there is plenty of justification for this. Thanks, Christian. Ok - i will send them a patch - let's see what they say. Andrey ** Andrey [1] - https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzov...@amd.com/ Andrey Grodzovsky (7): drm/amdgpu: Cache result of last reset at reset domain level. drm/amdgpu: Switch to delayed work from work_struct. drm/admgpu: Serialize RAS recovery work directly into reset domain queue. drm/amdgpu: Add delayed work for GPU reset from debugfs drm/amdgpu: Add delayed work for GPU reset from kfd. drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover drm/amdgpu: Stop any pending reset if another in progress. drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 6 +-- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 6 +-- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 +-- 14 files changed, 87 insertions(+), 54 deletions(-)
Re: [PATCH v2 0/7] Fix multiple GPU resets in XGMI hive.
On 2022-05-18 02:07, Christian König wrote: Am 17.05.22 um 21:20 schrieb Andrey Grodzovsky: Problem: During hive reset caused by command timing out on a ring extra resets are generated by triggered by KFD which is unable to accesses registers on the resetting ASIC. Fix: Rework GPU reset to actively stop any pending reset works while another in progress. v2: Switch from generic list as was in v1[1] to eplicit stopping of each reset request from each reset source per each request submitter. Looks mostly good to me. Apart from the naming nit pick on patch #1 the only thing I couldn't of hand figure out is why you are using a delayed work everywhere instead of a just a work item. That needs a bit further explanation what's happening here. Christian. Check APIs for cancelling work vs. delayed work - For work_struct the only public API is this - https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3214 - blocking cancel. For delayed_work we have both blocking and non blocking public APIs - https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295 https://elixir.bootlin.com/linux/latest/source/kernel/workqueue.c#L3295 I prefer not to go now into convincing core kernel people of exposing another interface for our own sake - from my past experience API changes in core code has slim chances and a lot of time spent on back and forth arguments. "If the mountain will not come to Muhammad, then Muhammad must go to the mountain" ;)* * Andrey [1] - https://lore.kernel.org/all/20220504161841.24669-1-andrey.grodzov...@amd.com/ Andrey Grodzovsky (7): drm/amdgpu: Cache result of last reset at reset domain level. drm/amdgpu: Switch to delayed work from work_struct. drm/admgpu: Serialize RAS recovery work directly into reset domain queue. drm/amdgpu: Add delayed work for GPU reset from debugfs drm/amdgpu: Add delayed work for GPU reset from kfd. drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover drm/amdgpu: Stop any pending reset if another in progress. drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 62 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 6 +-- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 6 +-- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 +-- 14 files changed, 87 insertions(+), 54 deletions(-)
[PATCH v2 6/7] drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover
We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +- 9 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 4ef17c6d1a50..ee668f253c7a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1244,7 +1244,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); -int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 4cc846341394..434053a9e027 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -129,7 +129,7 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, kfd.reset_work.work); - amdgpu_device_gpu_recover_imp(adev, NULL); + amdgpu_device_gpu_recover(adev, NULL); } void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ae4c37c89ac7..65f738fd4761 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5065,7 +5065,7 @@ static void amdgpu_device_recheck_guilty_jobs( * Returns 0 for success or an error on failure. */ -int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index f980f1501c48..7954ebf16885 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -819,7 +819,7 @@ static void amdgpu_debugfs_reset_work(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, reset_work.work); - amdgpu_device_gpu_recover_imp(adev, NULL); + amdgpu_device_gpu_recover(adev, NULL); } #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index dfe7f2b8f0aa..10aa073600d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -64,7 +64,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - r = amdgpu_device_gpu_recover_imp(ring->adev, job); + r = amdgpu_device_gpu_recover(ring->adev, job); if (r) DRM_ERROR("GPU Recovery Failed: %d\n", r); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7e8c7bcc7303..221d24feb8c9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1918,7 +1918,7 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) } if (amdgpu_device_should_recover_gpu(ras->adev)) - amdgpu_device_gpu_recover_imp(ras->adev, NULL); + amdgpu_device_gpu_recover(ras->adev, NULL); atomic_set(>in_recovery, 0); } diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index aa5f6d6ea1e3..3b7d9f171793 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -284,7 +284,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || ade
[PATCH v2 7/7] drm/amdgpu: Stop any pending reset if another in progress.
We skip rest requests if another one is already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++ 1 file changed, 27 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 65f738fd4761..43af5ea3eee5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5054,6 +5054,27 @@ static void amdgpu_device_recheck_guilty_jobs( } } +static inline void amdggpu_device_stop_pedning_resets(struct amdgpu_device* adev) +{ + struct amdgpu_ras *con = amdgpu_ras_get_context(adev); + +#if defined(CONFIG_DEBUG_FS) + if (!amdgpu_sriov_vf(adev)) + cancel_delayed_work(>reset_work); +#endif + + if (adev->kfd.dev) + cancel_delayed_work(>kfd.reset_work); + + if (amdgpu_sriov_vf(adev)) + cancel_delayed_work(>virt.flr_work); + + if (con && adev->ras_enabled) + cancel_delayed_work(>recovery_work); + +} + + /** * amdgpu_device_gpu_recover - reset the asic and recover scheduler * @@ -5209,6 +5230,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, r, adev_to_drm(tmp_adev)->unique); tmp_adev->asic_reset_res = r; } + + /* +* Drop all pending non scheduler resets. Scheduler resets +* were already dropped during drm_sched_stop +*/ + amdggpu_device_stop_pedning_resets(tmp_adev); } tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter)); -- 2.25.1
[PATCH v2 5/7] drm/amdgpu: Add delayed work for GPU reset from kfd.
We need to have a delayed work to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 -- 3 files changed, 15 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 1f8161cd507f..4cc846341394 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -33,6 +33,7 @@ #include #include "amdgpu_ras.h" #include "amdgpu_umc.h" +#include "amdgpu_reset.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -122,6 +123,15 @@ static void amdgpu_doorbell_get_kfd_info(struct amdgpu_device *adev, } } + +static void amdgpu_amdkfd_reset_work(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + kfd.reset_work.work); + + amdgpu_device_gpu_recover_imp(adev, NULL); +} + void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) { int i; @@ -180,6 +190,8 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev, adev_to_drm(adev), _resources); + + INIT_DELAYED_WORK(>kfd.reset_work, amdgpu_amdkfd_reset_work); } } @@ -247,7 +259,8 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev) void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev) { if (amdgpu_device_should_recover_gpu(adev)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_reset_domain_schedule(adev->reset_domain, +>kfd.reset_work); } int amdgpu_amdkfd_alloc_gtt_mem(struct amdgpu_device *adev, size_t size, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index f8b9f27adcf5..5e04dba8c7f9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -96,6 +96,7 @@ struct amdgpu_kfd_dev { struct kfd_dev *dev; uint64_t vram_used; bool init_complete; + struct delayed_work reset_work; }; enum kgd_engine_type { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ea41edf52a6f..ae4c37c89ac7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5308,37 +5308,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, return r; } -struct amdgpu_recover_work_struct { - struct delayed_work base; - struct amdgpu_device *adev; - struct amdgpu_job *job; - int ret; -}; - -static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) -{ - struct amdgpu_recover_work_struct *recover_work = container_of(work, struct amdgpu_recover_work_struct, base.work); - - amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); -} -/* - * Serialize gpu recover into reset domain single threaded wq - */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, - struct amdgpu_job *job) -{ - struct amdgpu_recover_work_struct work = {.adev = adev, .job = job}; - - INIT_DELAYED_WORK(, amdgpu_device_queue_gpu_recover_work); - - if (!amdgpu_reset_domain_schedule(adev->reset_domain, )) - return -EAGAIN; - - flush_delayed_work(); - - return atomic_read(>reset_domain->reset_res); -} - /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * -- 2.25.1
[PATCH v2 4/7] drm/amdgpu: Add delayed work for GPU reset from debugfs
We need to have a delayed work to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 19 +-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 3c20c2eadf4e..4ef17c6d1a50 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1048,6 +1048,8 @@ struct amdgpu_device { boolscpm_enabled; uint32_tscpm_status; + + struct delayed_work reset_work; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d16c8c1f72db..f980f1501c48 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -39,6 +39,7 @@ #include #include "amdgpu.h" #include "amdgpu_trace.h" +#include "amdgpu_reset.h" /* * Fences @@ -798,7 +799,10 @@ static int gpu_recover_get(void *data, u64 *val) return 0; } - *val = amdgpu_device_gpu_recover(adev, NULL); + if (amdgpu_reset_domain_schedule(adev->reset_domain, >reset_work)) + flush_delayed_work(>reset_work); + + *val = atomic_read(>reset_domain->reset_res); pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); @@ -810,6 +814,14 @@ DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, "%lld\n"); +static void amdgpu_debugfs_reset_work(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + reset_work.work); + + amdgpu_device_gpu_recover_imp(adev, NULL); +} + #endif void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) @@ -821,9 +833,12 @@ void amdgpu_debugfs_fence_init(struct amdgpu_device *adev) debugfs_create_file("amdgpu_fence_info", 0444, root, adev, _debugfs_fence_info_fops); - if (!amdgpu_sriov_vf(adev)) + if (!amdgpu_sriov_vf(adev)) { + + INIT_DELAYED_WORK(>reset_work, amdgpu_debugfs_reset_work); debugfs_create_file("amdgpu_gpu_recover", 0444, root, adev, _debugfs_gpu_recover_fops); + } #endif } -- 2.25.1