Re: [Intel-gfx] [PATCH v3 05/16] drm/i915: track ring progression using seqnos

2013-04-22 Thread Mika Kuoppala
Ben Widawsky b...@bwidawsk.net writes:

 On Sat, Apr 20, 2013 at 11:43:51AM -0700, Ben Widawsky wrote:
 On Thu, Apr 04, 2013 at 06:32:37PM +0300, Mika Kuoppala wrote:
  Instead of relying in acthd, track ring seqno progression
  to detect if ring has hung.
  
  v2: put hangcheck stuff inside struct (Chris Wilson)
  
  Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
 
 This patch really scares me. I think we don't want to ever stop using
 ACTHD. AFAICT, ACTHD is aways something we want to track in regards to
 the GPU not making progress. (unless we're worried about cycles in the
 CS, but there, seqno would equally not help us track progress).
 
 The two danger cases are:
 an infinite loop in a shader
 never called MI_BATCH_BUFFER_END
 
 And for both, *I think* ACTHD won't change.

 I'm wrong. I was confusing HEAD with ACTHD. ACTHD is supposed to show
 the address of either the batch, or the ring. HEAD is something we could
 use instead though. Just a thought.

Yes, i confirmed this with my test case. ACTHD will change inside the 
ring and also inside the batchbuffer.

The reason I chose seqnos is because that is the least hw (gen)
dependant way to measure the if progress is being made.
And such most close of what userspace sees as progress.
If batch doesn't get retired for whatever reason, reset will happen.

This way we can detect hangs, looped batchbuffers and infinite,
or more precisely too long loops, inside shaders.

I will need to update the commit message to reflect
the fact that with this patch, the batch needs to complete
in 3 second time window.

-Mika


 
 I believe what the patch really wants is to stop using instdone, but
 continue to use ACTHD as a fallback.
 
 I think if you can keep ACTHD around, I'd be willing to r-b this.
 
 Discussion moved to IRC.
 
 
 [snip]
 -- 
 Ben Widawsky, Intel Open Source Technology Center
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

 -- 
 Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 05/16] drm/i915: track ring progression using seqnos

2013-04-21 Thread Ben Widawsky
On Sat, Apr 20, 2013 at 11:43:51AM -0700, Ben Widawsky wrote:
 On Thu, Apr 04, 2013 at 06:32:37PM +0300, Mika Kuoppala wrote:
  Instead of relying in acthd, track ring seqno progression
  to detect if ring has hung.
  
  v2: put hangcheck stuff inside struct (Chris Wilson)
  
  Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
 
 This patch really scares me. I think we don't want to ever stop using
 ACTHD. AFAICT, ACTHD is aways something we want to track in regards to
 the GPU not making progress. (unless we're worried about cycles in the
 CS, but there, seqno would equally not help us track progress).
 
 The two danger cases are:
 an infinite loop in a shader
 never called MI_BATCH_BUFFER_END
 
 And for both, *I think* ACTHD won't change.

I'm wrong. I was confusing HEAD with ACTHD. ACTHD is supposed to show
the address of either the batch, or the ring. HEAD is something we could
use instead though. Just a thought.

 
 I believe what the patch really wants is to stop using instdone, but
 continue to use ACTHD as a fallback.
 
 I think if you can keep ACTHD around, I'd be willing to r-b this.
 
 Discussion moved to IRC.
 
 
 [snip]
 -- 
 Ben Widawsky, Intel Open Source Technology Center
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 05/16] drm/i915: track ring progression using seqnos

2013-04-20 Thread Ben Widawsky
On Thu, Apr 04, 2013 at 06:32:37PM +0300, Mika Kuoppala wrote:
 Instead of relying in acthd, track ring seqno progression
 to detect if ring has hung.
 
 v2: put hangcheck stuff inside struct (Chris Wilson)
 
 Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com

This patch really scares me. I think we don't want to ever stop using
ACTHD. AFAICT, ACTHD is aways something we want to track in regards to
the GPU not making progress. (unless we're worried about cycles in the
CS, but there, seqno would equally not help us track progress).

The two danger cases are:
an infinite loop in a shader
never called MI_BATCH_BUFFER_END

And for both, *I think* ACTHD won't change.

I believe what the patch really wants is to stop using instdone, but
continue to use ACTHD as a fallback.

I think if you can keep ACTHD around, I'd be willing to r-b this.

Discussion moved to IRC.


[snip]
-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 05/16] drm/i915: track ring progression using seqnos

2013-04-04 Thread Mika Kuoppala
Instead of relying in acthd, track ring seqno progression
to detect if ring has hung.

v2: put hangcheck stuff inside struct (Chris Wilson)

Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
 drivers/gpu/drm/i915/i915_drv.h |2 --
 drivers/gpu/drm/i915/i915_irq.c |   30 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |6 ++
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3c85c1b..0404116 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -816,8 +816,6 @@ struct i915_gpu_error {
 #define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
struct timer_list hangcheck_timer;
int hangcheck_count;
-   uint32_t last_acthd[I915_NUM_RINGS];
-   uint32_t prev_instdone[I915_NUM_INSTDONE_REG];
 
/* For reset and error_state handling. */
spinlock_t lock;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 45c83fb..0363871 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1956,22 +1956,19 @@ void i915_hangcheck_elapsed(unsigned long data)
 {
struct drm_device *dev = (struct drm_device *)data;
drm_i915_private_t *dev_priv = dev-dev_private;
-   uint32_t acthd[I915_NUM_RINGS], instdone[I915_NUM_INSTDONE_REG];
struct intel_ring_buffer *ring;
bool err = false, idle;
int i;
+   u32 seqno[I915_NUM_RINGS];
+   bool work_done;
 
if (!i915_enable_hangcheck)
return;
 
-   memset(acthd, 0, sizeof(acthd));
idle = true;
for_each_ring(ring, dev_priv, i) {
-   u32 seqno;
-
-   seqno = ring-get_seqno(ring, false);
-   idle = i915_hangcheck_ring_idle(ring, seqno, err);
-   acthd[i] = intel_ring_get_active_head(ring);
+   seqno[i] = ring-get_seqno(ring, false);
+   idle = i915_hangcheck_ring_idle(ring, seqno[i], err);
}
 
/* If all work is done then ACTHD clearly hasn't advanced. */
@@ -1987,20 +1984,19 @@ void i915_hangcheck_elapsed(unsigned long data)
return;
}
 
-   i915_get_extra_instdone(dev, instdone);
-   if (memcmp(dev_priv-gpu_error.last_acthd, acthd,
-  sizeof(acthd)) == 0 
-   memcmp(dev_priv-gpu_error.prev_instdone, instdone,
-  sizeof(instdone)) == 0) {
+   work_done = false;
+   for_each_ring(ring, dev_priv, i) {
+   if (ring-hangcheck.seqno != seqno[i]) {
+   work_done = true;
+   ring-hangcheck.seqno = seqno[i];
+   }
+   }
+
+   if (!work_done) {
if (i915_hangcheck_hung(dev))
return;
} else {
dev_priv-gpu_error.hangcheck_count = 0;
-
-   memcpy(dev_priv-gpu_error.last_acthd, acthd,
-  sizeof(acthd));
-   memcpy(dev_priv-gpu_error.prev_instdone, instdone,
-  sizeof(instdone));
}
 
 repeat:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d66208c..844381e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -37,6 +37,10 @@ struct  intel_hw_status_page {
 #define I915_READ_SYNC_0(ring) I915_READ(RING_SYNC_0((ring)-mmio_base))
 #define I915_READ_SYNC_1(ring) I915_READ(RING_SYNC_1((ring)-mmio_base))
 
+struct intel_ring_hangcheck {
+   u32 seqno;
+};
+
 struct  intel_ring_buffer {
const char  *name;
enum intel_ring_id {
@@ -137,6 +141,8 @@ struct  intel_ring_buffer {
struct i915_hw_context *default_context;
struct drm_i915_gem_object *last_context_obj;
 
+   struct intel_ring_hangcheck hangcheck;
+
void *private;
 };
 
-- 
1.7.9.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx