Re: [Intel-gfx] [PATCH 1/3] drm/i915: Fix negative remaining time after retire requests
On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote: Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") extended the API of intel_gt_retire_requests_timeout() with an extra argument 'remaining_timeout', intended for passing back unconsumed portion of requested timeout when 0 (success) is returned. However, when request retirement happens to succeed despite an error returned by dma_fence_wait_timeout(), the error code (a negative value) is passed back instead of remaining time. If a user then passes that negative value forward as requested timeout to another wait, an explicit WARN or BUG can be triggered. Instead of copying the value of timeout variable to *remaining_timeout before return, update the *remaining_timeout after each DMA fence wait. Thanks for the detailed comment, indeed we were not accounting for the return value of dma_fence_wait_timeout() Acked-by: Nirmoy Das Thanks, Nirmoy Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been consumed on other errors returned from the wait. Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") Signed-off-by: Janusz Krzysztofik Cc: sta...@vger.kernel.org # v5.15+ --- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index edb881d756309..ccaf2fd80625b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, unsigned long active_count = 0; LIST_HEAD(free); + if (remaining_timeout) + *remaining_timeout = timeout; + flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */ spin_lock(>lock); list_for_each_entry_safe(tl, tn, >active_list, link) { @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, timeout); dma_fence_put(fence); +if (remaining_timeout) { + /* +* If we get an error here but request +* retirement succeeds anyway +* (!active_count) and we return 0, the +* caller may want to spend remaining +* time on waiting for other events. +*/ + if (timeout == -ETIME || + timeout == -EINTR || + timeout == -ERESTARTSYS) + *remaining_timeout = 0; + else if (timeout >= 0) + *remaining_timeout = timeout; + /* else assume no time consumed */ + } + /* Retirement is best effort */ if (!mutex_trylock(>mutex)) { active_count++; @@ -196,9 +216,6 @@ out_active: spin_lock(>lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - if (remaining_timeout) - *remaining_timeout = timeout; - return active_count ? timeout : 0; }
Re: [Intel-gfx] [PATCH 1/3] drm/i915: Fix negative remaining time after retire requests
On 16.11.2022 12:25, Janusz Krzysztofik wrote: Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") extended the API of intel_gt_retire_requests_timeout() with an extra argument 'remaining_timeout', intended for passing back unconsumed portion of requested timeout when 0 (success) is returned. However, when request retirement happens to succeed despite an error returned by dma_fence_wait_timeout(), the error code (a negative value) is passed back instead of remaining time. If a user then passes that negative value forward as requested timeout to another wait, an explicit WARN or BUG can be triggered. Instead of copying the value of timeout variable to *remaining_timeout before return, update the *remaining_timeout after each DMA fence wait. Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been consumed on other errors returned from the wait. Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") Signed-off-by: Janusz Krzysztofik Cc: sta...@vger.kernel.org # v5.15+ --- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index edb881d756309..ccaf2fd80625b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, unsigned long active_count = 0; LIST_HEAD(free); + if (remaining_timeout) + *remaining_timeout = timeout; + flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */ spin_lock(>lock); list_for_each_entry_safe(tl, tn, >active_list, link) { @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, timeout); dma_fence_put(fence); +if (remaining_timeout) { + /* +* If we get an error here but request +* retirement succeeds anyway +* (!active_count) and we return 0, the +* caller may want to spend remaining +* time on waiting for other events. +*/ + if (timeout == -ETIME || + timeout == -EINTR || + timeout == -ERESTARTSYS) + *remaining_timeout = 0; + else if (timeout >= 0) + *remaining_timeout = timeout; + /* else assume no time consumed */ Looks correct, but the crazy semantic of dma_fence_wait_timeout does not make it easy to understand. Reviewed-by: Andrzej Hajda Regards Andrzej + } + /* Retirement is best effort */ if (!mutex_trylock(>mutex)) { active_count++; @@ -196,9 +216,6 @@ out_active: spin_lock(>lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - if (remaining_timeout) - *remaining_timeout = timeout; - return active_count ? timeout : 0; }