Re: [Intel-gfx] [PATCH 1/3] drm/i915: Fix negative remaining time after retire requests

2022-11-17 Thread Das, Nirmoy

On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote:


Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
with GuC") extended the API of intel_gt_retire_requests_timeout() with an
extra argument 'remaining_timeout', intended for passing back unconsumed
portion of requested timeout when 0 (success) is returned.  However, when
request retirement happens to succeed despite an error returned by
dma_fence_wait_timeout(), the error code (a negative value) is passed back
instead of remaining time.  If a user then passes that negative value
forward as requested timeout to another wait, an explicit WARN or BUG can
be triggered.

Instead of copying the value of timeout variable to *remaining_timeout
before return, update the *remaining_timeout after each DMA fence wait.



Thanks for the detailed comment, indeed we were not accounting for the 
return value of dma_fence_wait_timeout()


Acked-by: Nirmoy Das 


Thanks,

Nirmoy



Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
consumed on other errors returned from the wait.

Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with 
GuC")
Signed-off-by: Janusz Krzysztofik 
Cc: sta...@vger.kernel.org # v5.15+
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..ccaf2fd80625b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, 
long timeout,
unsigned long active_count = 0;
LIST_HEAD(free);
  
+	if (remaining_timeout)

+   *remaining_timeout = timeout;
+
flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
spin_lock(>lock);
list_for_each_entry_safe(tl, tn, >active_list, link) {
@@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, 
long timeout,
 timeout);
dma_fence_put(fence);
  
+if (remaining_timeout) {

+   /*
+* If we get an error here but request
+* retirement succeeds anyway
+* (!active_count) and we return 0, the
+* caller may want to spend remaining
+* time on waiting for other events.
+*/
+   if (timeout == -ETIME ||
+   timeout == -EINTR ||
+   timeout == -ERESTARTSYS)
+   *remaining_timeout = 0;
+   else if (timeout >= 0)
+   *remaining_timeout = timeout;
+   /* else assume no time consumed */
+   }
+
/* Retirement is best effort */
if (!mutex_trylock(>mutex)) {
active_count++;
@@ -196,9 +216,6 @@ out_active: spin_lock(>lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
  
-	if (remaining_timeout)

-   *remaining_timeout = timeout;
-
return active_count ? timeout : 0;
  }
  


Re: [Intel-gfx] [PATCH 1/3] drm/i915: Fix negative remaining time after retire requests

2022-11-16 Thread Andrzej Hajda

On 16.11.2022 12:25, Janusz Krzysztofik wrote:

Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
with GuC") extended the API of intel_gt_retire_requests_timeout() with an
extra argument 'remaining_timeout', intended for passing back unconsumed
portion of requested timeout when 0 (success) is returned.  However, when
request retirement happens to succeed despite an error returned by
dma_fence_wait_timeout(), the error code (a negative value) is passed back
instead of remaining time.  If a user then passes that negative value
forward as requested timeout to another wait, an explicit WARN or BUG can
be triggered.

Instead of copying the value of timeout variable to *remaining_timeout
before return, update the *remaining_timeout after each DMA fence wait.
Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
consumed on other errors returned from the wait.

Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with 
GuC")
Signed-off-by: Janusz Krzysztofik 
Cc: sta...@vger.kernel.org # v5.15+
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..ccaf2fd80625b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, 
long timeout,
unsigned long active_count = 0;
LIST_HEAD(free);
  
+	if (remaining_timeout)

+   *remaining_timeout = timeout;
+
flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
spin_lock(>lock);
list_for_each_entry_safe(tl, tn, >active_list, link) {
@@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, 
long timeout,
 timeout);
dma_fence_put(fence);
  
+if (remaining_timeout) {

+   /*
+* If we get an error here but request
+* retirement succeeds anyway
+* (!active_count) and we return 0, the
+* caller may want to spend remaining
+* time on waiting for other events.
+*/
+   if (timeout == -ETIME ||
+   timeout == -EINTR ||
+   timeout == -ERESTARTSYS)
+   *remaining_timeout = 0;
+   else if (timeout >= 0)
+   *remaining_timeout = timeout;
+   /* else assume no time consumed */


Looks correct, but the crazy semantic of dma_fence_wait_timeout does not 
make it easy to understand.


Reviewed-by: Andrzej Hajda 

Regards
Andrzej



+   }
+
/* Retirement is best effort */
if (!mutex_trylock(>mutex)) {
active_count++;
@@ -196,9 +216,6 @@ out_active: spin_lock(>lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
  
-	if (remaining_timeout)

-   *remaining_timeout = timeout;
-
return active_count ? timeout : 0;
  }