Re: [PATCH 1/3] drm/panthor: Fix tiler OOM handling to allow incremental rendering
On Thu, 25 Apr 2024 10:28:49 +0100 Steven Price wrote: > On 25/04/2024 08:18, Boris Brezillon wrote: > > From: Antonino Maniscalco > > > > If the kernel couldn't allocate memory because we reached the maximum > > number of chunks but no render passes are in flight > > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM > > handling to the FW by returning a NULL chunk. The FW will then call > > the tiler OOM exception handler, which is supposed to implement > > incremental rendering (execute an intermediate fragment job to flush > > the pending primitives, release the tiler memory that was used to > > store those primitives, and start over from where it stopped). > > > > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") > > Signed-off-by: Antonino Maniscalco > > Signed-off-by: Boris Brezillon > > Reviewed-by: Steven Price > > Although I think the real issue here is that we haven't clearly defined > the return values from panthor_heap_grow - it's a bit weird to have two > different error codes for the same "try again later after incremental > rendering" result. But as a fix this seems most clear. Yeah, I actually considered returning -EBUSY for the 'max_chunks reached' situation, but then realized we would also want to trigger incremental rendering for actual mem allocation failures (when chunk_count < max_chunks) once the fail-able/non-blocking allocation logic is implemented, and for this kind of failure it makes more sense to return -ENOMEM, even though this implies checking against two values instead of one. I guess returning -ENOMEM instead of -EBUSY for the case where we have render passes in-flight wouldn't be too awkward, as this can be seen as the kernel refusing to allocate more memory. > > Steve > > > --- > > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c > > b/drivers/gpu/drm/panthor/panthor_sched.c > > index b3a51a6de523..6de8c0c702cb 100644 > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct > > panthor_group *group, u32 cs_id) > > pending_frag_count, _chunk_va); > > } > > > > - if (ret && ret != -EBUSY) { > > + /* If the kernel couldn't allocate memory because we reached the maximum > > +* number of chunks (EBUSY if we have render passes in flight, ENOMEM > > +* otherwise), we want to let the FW try to reclaim memory by waiting > > +* for fragment jobs to land or by executing the tiler OOM exception > > +* handler, which is supposed to implement incremental rendering. > > +*/ > > + if (ret && ret != -EBUSY && ret != -ENOMEM) { > > drm_warn(>base, "Failed to extend the tiler heap\n"); > > group->fatal_queues |= BIT(cs_id); > > sched_queue_delayed_work(sched, tick, 0); >
Re: [PATCH 1/3] drm/panthor: Fix tiler OOM handling to allow incremental rendering
On 25/04/2024 08:18, Boris Brezillon wrote: > From: Antonino Maniscalco > > If the kernel couldn't allocate memory because we reached the maximum > number of chunks but no render passes are in flight > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM > handling to the FW by returning a NULL chunk. The FW will then call > the tiler OOM exception handler, which is supposed to implement > incremental rendering (execute an intermediate fragment job to flush > the pending primitives, release the tiler memory that was used to > store those primitives, and start over from where it stopped). > > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") > Signed-off-by: Antonino Maniscalco > Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Although I think the real issue here is that we haven't clearly defined the return values from panthor_heap_grow - it's a bit weird to have two different error codes for the same "try again later after incremental rendering" result. But as a fix this seems most clear. Steve > --- > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c > b/drivers/gpu/drm/panthor/panthor_sched.c > index b3a51a6de523..6de8c0c702cb 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct > panthor_group *group, u32 cs_id) > pending_frag_count, _chunk_va); > } > > - if (ret && ret != -EBUSY) { > + /* If the kernel couldn't allocate memory because we reached the maximum > + * number of chunks (EBUSY if we have render passes in flight, ENOMEM > + * otherwise), we want to let the FW try to reclaim memory by waiting > + * for fragment jobs to land or by executing the tiler OOM exception > + * handler, which is supposed to implement incremental rendering. > + */ > + if (ret && ret != -EBUSY && ret != -ENOMEM) { > drm_warn(>base, "Failed to extend the tiler heap\n"); > group->fatal_queues |= BIT(cs_id); > sched_queue_delayed_work(sched, tick, 0);
[PATCH 1/3] drm/panthor: Fix tiler OOM handling to allow incremental rendering
From: Antonino Maniscalco If the kernel couldn't allocate memory because we reached the maximum number of chunks but no render passes are in flight (panthor_heap_grow() returning -ENOMEM), we should defer the OOM handling to the FW by returning a NULL chunk. The FW will then call the tiler OOM exception handler, which is supposed to implement incremental rendering (execute an intermediate fragment job to flush the pending primitives, release the tiler memory that was used to store those primitives, and start over from where it stopped). Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Antonino Maniscalco Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panthor/panthor_sched.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index b3a51a6de523..6de8c0c702cb 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id) pending_frag_count, _chunk_va); } - if (ret && ret != -EBUSY) { + /* If the kernel couldn't allocate memory because we reached the maximum +* number of chunks (EBUSY if we have render passes in flight, ENOMEM +* otherwise), we want to let the FW try to reclaim memory by waiting +* for fragment jobs to land or by executing the tiler OOM exception +* handler, which is supposed to implement incremental rendering. +*/ + if (ret && ret != -EBUSY && ret != -ENOMEM) { drm_warn(>base, "Failed to extend the tiler heap\n"); group->fatal_queues |= BIT(cs_id); sched_queue_delayed_work(sched, tick, 0); -- 2.44.0