Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-25 Thread Daniel Vetter
On Mon, Mar 24, 2014 at 03:37:12PM -0700, Ben Widawsky wrote:
 On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
  On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
   Hi,
   
   Daniel Vetter daniel.vet...@ffwll.ch writes:
   
There's an entire pile of issues in here:
   
- Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
  offset of the batch buffer when a batch is executed. Semaphores are
  always emitted to the main ring, so we always want to look at that.
   
   The ipehr check should make sure that we are at the ring and
   acthd should not be at batch.
   
   Regardless I agree that RING_HEAD is much more safer. When I have
   tried to get bottom of the snb reset hang, I have noticed that
   after reset the acthd is sometimes 0x0c even tho head is 0x00,
   on snb.
  
  Hm, should we maybe mask out the lower bits, too? If you can confirm this,
  can you please add a follow-up patch?
  
- Mask the obtained HEAD pointer with the actual ring size, which is
  much smaller. Together with the above issue this resulted us in
  trying to dereference a pointer way outside of the ring mmio
  mapping. The resulting invalid access in interrupt context
  (hangcheck is executed from timers) lead to a full blown kernel
  panic. The fbcon panic handler then tried to frob our driver harder,
  resulting in a full machine hang at least on my snb here where I've
  stumbled over this.
   
- Handle ring wrapping correctly and be a bit more explicit about how
  many dwords we're scanning. We probably should also scan more than
  just 4 ...
   
   ipehr dont update on MI_NOOPS and the head should point to
   the extra MI_NOOP after semaphore sequence. So 4 should be
   enough. Perhaps revisit this when bdw gains semaphores.
  
  Yeah, Chris also mentioned that the HEAD should point at one dword past
  the end of the ring, so should even work when there are no MI_NOOPs. In
  any case this code is more robust in case hw engineers suddenly change the
  behaviour.
  
- Space out some of teh computations for readability.
   
This prevents hard-hangs on my snb here. Verdict from QA is still
pending, but the symptoms match.
   
Cc: Mika Kuoppala mika.kuopp...@intel.com
Cc: Ben Widawsky b...@bwidawsk.net
Cc: Chris Wilson ch...@chris-wilson.co.uk
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
   
   The hard hangs are not so frequent with this patch but
   they are still there. This patch should take care of
   the oops we have been seeing, but there is still
   something else to be found until #74100 can be marked as
   fixed.
   
   Very often after reset, when igt has pushed the quietence
   batches into rings, blitter and video ring heads gets
   moved properly but all updates to hardware status page and to
   the sync registers are missing. And result is hang by hangcheck.
   Repeat enough times and it is a hard hang.
   
Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
   
   Please remove the blowup comment and also update the
   bugzilla tag. 
  
  Yeah, QA also says that it doesn't fix the hard hangs, only seems to
  reduce them a bit on certain setups. I've updated the commit message.
  
  btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
  greatly increase the chances that the last few message get correctly
  transmitted through other means like netconsole.
  
   Patches 1-2 and the followup one are
   
   Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com
  
  Thanks for the review, patches merged.
  -Daniel
 
 Patch 2 was trivial to implement for gen8. This functionality is a lot
 less trivial. I volunteered to do patch 2, are you going to port this to
 gen8?

If you fill out the bdw pieces for patch 23 the only thing you need to
change here is to adjsut the number of dwords we walk backwards to make
sure that the bdw semaphore cmd still fits. Or at least that's been my
idea behind all this, maybe I've overlooked something.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-25 Thread Ben Widawsky
On Tue, Mar 25, 2014 at 10:42:09AM +0100, Daniel Vetter wrote:
 On Mon, Mar 24, 2014 at 03:37:12PM -0700, Ben Widawsky wrote:
  On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
   On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
Hi,

Daniel Vetter daniel.vet...@ffwll.ch writes:

 There's an entire pile of issues in here:

 - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
   offset of the batch buffer when a batch is executed. Semaphores are
   always emitted to the main ring, so we always want to look at that.

The ipehr check should make sure that we are at the ring and
acthd should not be at batch.

Regardless I agree that RING_HEAD is much more safer. When I have
tried to get bottom of the snb reset hang, I have noticed that
after reset the acthd is sometimes 0x0c even tho head is 0x00,
on snb.
   
   Hm, should we maybe mask out the lower bits, too? If you can confirm this,
   can you please add a follow-up patch?
   
 - Mask the obtained HEAD pointer with the actual ring size, which is
   much smaller. Together with the above issue this resulted us in
   trying to dereference a pointer way outside of the ring mmio
   mapping. The resulting invalid access in interrupt context
   (hangcheck is executed from timers) lead to a full blown kernel
   panic. The fbcon panic handler then tried to frob our driver harder,
   resulting in a full machine hang at least on my snb here where I've
   stumbled over this.

 - Handle ring wrapping correctly and be a bit more explicit about how
   many dwords we're scanning. We probably should also scan more than
   just 4 ...

ipehr dont update on MI_NOOPS and the head should point to
the extra MI_NOOP after semaphore sequence. So 4 should be
enough. Perhaps revisit this when bdw gains semaphores.
   
   Yeah, Chris also mentioned that the HEAD should point at one dword past
   the end of the ring, so should even work when there are no MI_NOOPs. In
   any case this code is more robust in case hw engineers suddenly change the
   behaviour.
   
 - Space out some of teh computations for readability.

 This prevents hard-hangs on my snb here. Verdict from QA is still
 pending, but the symptoms match.

 Cc: Mika Kuoppala mika.kuopp...@intel.com
 Cc: Ben Widawsky b...@bwidawsk.net
 Cc: Chris Wilson ch...@chris-wilson.co.uk
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100

The hard hangs are not so frequent with this patch but
they are still there. This patch should take care of
the oops we have been seeing, but there is still
something else to be found until #74100 can be marked as
fixed.

Very often after reset, when igt has pushed the quietence
batches into rings, blitter and video ring heads gets
moved properly but all updates to hardware status page and to
the sync registers are missing. And result is hang by hangcheck.
Repeat enough times and it is a hard hang.

 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch

Please remove the blowup comment and also update the
bugzilla tag. 
   
   Yeah, QA also says that it doesn't fix the hard hangs, only seems to
   reduce them a bit on certain setups. I've updated the commit message.
   
   btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
   greatly increase the chances that the last few message get correctly
   transmitted through other means like netconsole.
   
Patches 1-2 and the followup one are

Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com
   
   Thanks for the review, patches merged.
   -Daniel
  
  Patch 2 was trivial to implement for gen8. This functionality is a lot
  less trivial. I volunteered to do patch 2, are you going to port this to
  gen8?
 
 If you fill out the bdw pieces for patch 23 the only thing you need to
 change here is to adjsut the number of dwords we walk backwards to make
 sure that the bdw semaphore cmd still fits. Or at least that's been my
 idea behind all this, maybe I've overlooked something.
 -Daniel

I don't think it's quite that easy, but I took it as a, yes. Which one
is patch 3?

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-25 Thread Daniel Vetter
On Tue, Mar 25, 2014 at 3:54 PM, Ben Widawsky b...@bwidawsk.net wrote:
 I don't think it's quite that easy, but I took it as a, yes. Which one
 is patch 3?

drm/i915: make semaphore signaller detection more robust - it was a
follow-up patch in this thread, hence no 3/3. If the relevant bits on
bdw have moved out of the the cmd dword I guess we need to keep a
cache of the relevant dword and supply it to the function. Was too
lazy to check that though.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-24 Thread Ben Widawsky
On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
 On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
  Hi,
  
  Daniel Vetter daniel.vet...@ffwll.ch writes:
  
   There's an entire pile of issues in here:
  
   - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
 offset of the batch buffer when a batch is executed. Semaphores are
 always emitted to the main ring, so we always want to look at that.
  
  The ipehr check should make sure that we are at the ring and
  acthd should not be at batch.
  
  Regardless I agree that RING_HEAD is much more safer. When I have
  tried to get bottom of the snb reset hang, I have noticed that
  after reset the acthd is sometimes 0x0c even tho head is 0x00,
  on snb.
 
 Hm, should we maybe mask out the lower bits, too? If you can confirm this,
 can you please add a follow-up patch?
 
   - Mask the obtained HEAD pointer with the actual ring size, which is
 much smaller. Together with the above issue this resulted us in
 trying to dereference a pointer way outside of the ring mmio
 mapping. The resulting invalid access in interrupt context
 (hangcheck is executed from timers) lead to a full blown kernel
 panic. The fbcon panic handler then tried to frob our driver harder,
 resulting in a full machine hang at least on my snb here where I've
 stumbled over this.
  
   - Handle ring wrapping correctly and be a bit more explicit about how
 many dwords we're scanning. We probably should also scan more than
 just 4 ...
  
  ipehr dont update on MI_NOOPS and the head should point to
  the extra MI_NOOP after semaphore sequence. So 4 should be
  enough. Perhaps revisit this when bdw gains semaphores.
 
 Yeah, Chris also mentioned that the HEAD should point at one dword past
 the end of the ring, so should even work when there are no MI_NOOPs. In
 any case this code is more robust in case hw engineers suddenly change the
 behaviour.
 
   - Space out some of teh computations for readability.
  
   This prevents hard-hangs on my snb here. Verdict from QA is still
   pending, but the symptoms match.
  
   Cc: Mika Kuoppala mika.kuopp...@intel.com
   Cc: Ben Widawsky b...@bwidawsk.net
   Cc: Chris Wilson ch...@chris-wilson.co.uk
   Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
  
  The hard hangs are not so frequent with this patch but
  they are still there. This patch should take care of
  the oops we have been seeing, but there is still
  something else to be found until #74100 can be marked as
  fixed.
  
  Very often after reset, when igt has pushed the quietence
  batches into rings, blitter and video ring heads gets
  moved properly but all updates to hardware status page and to
  the sync registers are missing. And result is hang by hangcheck.
  Repeat enough times and it is a hard hang.
  
   Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
  
  Please remove the blowup comment and also update the
  bugzilla tag. 
 
 Yeah, QA also says that it doesn't fix the hard hangs, only seems to
 reduce them a bit on certain setups. I've updated the commit message.
 
 btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
 greatly increase the chances that the last few message get correctly
 transmitted through other means like netconsole.
 
  Patches 1-2 and the followup one are
  
  Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com
 
 Thanks for the review, patches merged.
 -Daniel

Patch 2 was trivial to implement for gen8. This functionality is a lot
less trivial. I volunteered to do patch 2, are you going to port this to
gen8?

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-22 Thread Daniel Vetter
On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
 Hi,
 
 Daniel Vetter daniel.vet...@ffwll.ch writes:
 
  There's an entire pile of issues in here:
 
  - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
offset of the batch buffer when a batch is executed. Semaphores are
always emitted to the main ring, so we always want to look at that.
 
 The ipehr check should make sure that we are at the ring and
 acthd should not be at batch.
 
 Regardless I agree that RING_HEAD is much more safer. When I have
 tried to get bottom of the snb reset hang, I have noticed that
 after reset the acthd is sometimes 0x0c even tho head is 0x00,
 on snb.

Hm, should we maybe mask out the lower bits, too? If you can confirm this,
can you please add a follow-up patch?

  - Mask the obtained HEAD pointer with the actual ring size, which is
much smaller. Together with the above issue this resulted us in
trying to dereference a pointer way outside of the ring mmio
mapping. The resulting invalid access in interrupt context
(hangcheck is executed from timers) lead to a full blown kernel
panic. The fbcon panic handler then tried to frob our driver harder,
resulting in a full machine hang at least on my snb here where I've
stumbled over this.
 
  - Handle ring wrapping correctly and be a bit more explicit about how
many dwords we're scanning. We probably should also scan more than
just 4 ...
 
 ipehr dont update on MI_NOOPS and the head should point to
 the extra MI_NOOP after semaphore sequence. So 4 should be
 enough. Perhaps revisit this when bdw gains semaphores.

Yeah, Chris also mentioned that the HEAD should point at one dword past
the end of the ring, so should even work when there are no MI_NOOPs. In
any case this code is more robust in case hw engineers suddenly change the
behaviour.

  - Space out some of teh computations for readability.
 
  This prevents hard-hangs on my snb here. Verdict from QA is still
  pending, but the symptoms match.
 
  Cc: Mika Kuoppala mika.kuopp...@intel.com
  Cc: Ben Widawsky b...@bwidawsk.net
  Cc: Chris Wilson ch...@chris-wilson.co.uk
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
 
 The hard hangs are not so frequent with this patch but
 they are still there. This patch should take care of
 the oops we have been seeing, but there is still
 something else to be found until #74100 can be marked as
 fixed.
 
 Very often after reset, when igt has pushed the quietence
 batches into rings, blitter and video ring heads gets
 moved properly but all updates to hardware status page and to
 the sync registers are missing. And result is hang by hangcheck.
 Repeat enough times and it is a hard hang.
 
  Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
 
 Please remove the blowup comment and also update the
 bugzilla tag. 

Yeah, QA also says that it doesn't fix the hard hangs, only seems to
reduce them a bit on certain setups. I've updated the commit message.

btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
greatly increase the chances that the last few message get correctly
transmitted through other means like netconsole.

 Patches 1-2 and the followup one are
 
 Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com

Thanks for the review, patches merged.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-21 Thread Mika Kuoppala
Hi,

Daniel Vetter daniel.vet...@ffwll.ch writes:

 There's an entire pile of issues in here:

 - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
   offset of the batch buffer when a batch is executed. Semaphores are
   always emitted to the main ring, so we always want to look at that.

The ipehr check should make sure that we are at the ring and
acthd should not be at batch.

Regardless I agree that RING_HEAD is much more safer. When I have
tried to get bottom of the snb reset hang, I have noticed that
after reset the acthd is sometimes 0x0c even tho head is 0x00,
on snb.

 - Mask the obtained HEAD pointer with the actual ring size, which is
   much smaller. Together with the above issue this resulted us in
   trying to dereference a pointer way outside of the ring mmio
   mapping. The resulting invalid access in interrupt context
   (hangcheck is executed from timers) lead to a full blown kernel
   panic. The fbcon panic handler then tried to frob our driver harder,
   resulting in a full machine hang at least on my snb here where I've
   stumbled over this.

 - Handle ring wrapping correctly and be a bit more explicit about how
   many dwords we're scanning. We probably should also scan more than
   just 4 ...

ipehr dont update on MI_NOOPS and the head should point to
the extra MI_NOOP after semaphore sequence. So 4 should be
enough. Perhaps revisit this when bdw gains semaphores.

 - Space out some of teh computations for readability.

 This prevents hard-hangs on my snb here. Verdict from QA is still
 pending, but the symptoms match.

 Cc: Mika Kuoppala mika.kuopp...@intel.com
 Cc: Ben Widawsky b...@bwidawsk.net
 Cc: Chris Wilson ch...@chris-wilson.co.uk
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100

The hard hangs are not so frequent with this patch but
they are still there. This patch should take care of
the oops we have been seeing, but there is still
something else to be found until #74100 can be marked as
fixed.

Very often after reset, when igt has pushed the quietence
batches into rings, blitter and video ring heads gets
moved properly but all updates to hardware status page and to
the sync registers are missing. And result is hang by hangcheck.
Repeat enough times and it is a hard hang.

 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch

Please remove the blowup comment and also update the
bugzilla tag. 

Patches 1-2 and the followup one are

Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com

 ---
  drivers/gpu/drm/i915/i915_irq.c | 38 ++
  1 file changed, 26 insertions(+), 12 deletions(-)

 diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
 index be2713f12e08..473372a6c97d 100644
 --- a/drivers/gpu/drm/i915/i915_irq.c
 +++ b/drivers/gpu/drm/i915/i915_irq.c
 @@ -2529,29 +2529,43 @@ static struct intel_ring_buffer *
  semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno)
  {
   struct drm_i915_private *dev_priv = ring-dev-dev_private;
 - u32 cmd, ipehr, acthd, acthd_min;
 + u32 cmd, ipehr, head;
 + int i;
  
   ipehr = I915_READ(RING_IPEHR(ring-mmio_base));
   if ((ipehr  ~(0x3  16)) !=
   (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER))
   return NULL;
  
 - /* ACTHD is likely pointing to the dword after the actual command,
 -  * so scan backwards until we find the MBOX.
 + /*
 +  * HEAD is likely pointing to the dword after the actual command,
 +  * so scan backwards until we find the MBOX. But limit it to just 3
 +  * dwords. Note that we don't care about ACTHD here since that might
 +  * point at at batch, and semaphores are always emitted into the
 +  * ringbuffer itself.
*/
 - acthd = intel_ring_get_active_head(ring)  HEAD_ADDR;
 - acthd_min = max((int)acthd - 3 * 4, 0);
 - do {
 - cmd = ioread32(ring-virtual_start + acthd);
 + head = I915_READ_HEAD(ring)  HEAD_ADDR;
 +
 + for (i = 4; i; --i) {
 + /*
 +  * Be paranoid and presume the hw has gone off into the wild -
 +  * our ring is smaller than what the hardware (and hence
 +  * HEAD_ADDR) allows. Also handles wrap-around.
 +  */
 + head = ring-size - 1;
 +
 + /* This here seems to blow up */
 + cmd = ioread32(ring-virtual_start + head);
   if (cmd == ipehr)
   break;
  
 - acthd -= 4;
 - if (acthd  acthd_min)
 - return NULL;
 - } while (1);
 + head -= 4;
 + }
 +
 + if (!i)
 + return NULL;
  
 - *seqno = ioread32(ring-virtual_start+acthd+4)+1;
 + *seqno = ioread32(ring-virtual_start + head + 4) + 1;
   return dev_priv-ring[(ring-id + (((ipehr  17)  1) + 1)) % 3];
  }
  
 -- 
 1.8.1.4
___
Intel-gfx mailing list

Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-15 Thread Chris Wilson
On Sat, Mar 15, 2014 at 12:08:55AM +0100, Daniel Vetter wrote:
 There's an entire pile of issues in here:
 
 - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
   offset of the batch buffer when a batch is executed. Semaphores are
   always emitted to the main ring, so we always want to look at that.

True, nice catch that would explain a hard hang quite neatly if we tried
to read from beyond the aperture.
 
 - Mask the obtained HEAD pointer with the actual ring size, which is
   much smaller. Together with the above issue this resulted us in
   trying to dereference a pointer way outside of the ring mmio
   mapping. The resulting invalid access in interrupt context
   (hangcheck is executed from timers) lead to a full blown kernel
   panic. The fbcon panic handler then tried to frob our driver harder,
   resulting in a full machine hang at least on my snb here where I've
   stumbled over this.
 
 - Handle ring wrapping correctly and be a bit more explicit about how
   many dwords we're scanning. We probably should also scan more than
   just 4 ...

You can ignore ring wrapping as valid commands cannot wrap, hence the
formulation of acthd_min.

 /*
  * Be paranoid and presume the hw has gone off into the wild -
  * our ring is smaller than what the hardware (and hence
  * HEAD_ADDR) allows.
  */
 head = I915_READ_HEAD(ring)  (ring-size - 1);
 head_min = max((int)head - 3 * 4, 0);
 while (head = head_min) {
cmd = ioread32(ring-virtual_start + head);
if (cmd == ipehr)
break;
head -= 4;
 }

 + /*
 +  * Be paranoid and presume the hw has gone off into the wild -
 +  * our ring is smaller than what the hardware (and hence
 +  * HEAD_ADDR) allows. Also handles wrap-around.
 +  */
 + head = ring-size - 1;
 +
 + /* This here seems to blow up */
 + cmd = ioread32(ring-virtual_start + head);
So what does this mean?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-15 Thread Daniel Vetter
On Sat, Mar 15, 2014 at 4:46 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 On Sat, Mar 15, 2014 at 12:08:55AM +0100, Daniel Vetter wrote:
 There's an entire pile of issues in here:

 - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
   offset of the batch buffer when a batch is executed. Semaphores are
   always emitted to the main ring, so we always want to look at that.

 True, nice catch that would explain a hard hang quite neatly if we tried
 to read from beyond the aperture.

 - Mask the obtained HEAD pointer with the actual ring size, which is
   much smaller. Together with the above issue this resulted us in
   trying to dereference a pointer way outside of the ring mmio
   mapping. The resulting invalid access in interrupt context
   (hangcheck is executed from timers) lead to a full blown kernel
   panic. The fbcon panic handler then tried to frob our driver harder,
   resulting in a full machine hang at least on my snb here where I've
   stumbled over this.

 - Handle ring wrapping correctly and be a bit more explicit about how
   many dwords we're scanning. We probably should also scan more than
   just 4 ...

 You can ignore ring wrapping as valid commands cannot wrap, hence the
 formulation of acthd_min.

Well I've thought that HEAD usually points at the next command. I'm
not sure about the exact semantics at wrap-around time but since
implementing it was trivially I've figured it's better to be paranoid.

  /*
   * Be paranoid and presume the hw has gone off into the wild -
   * our ring is smaller than what the hardware (and hence
   * HEAD_ADDR) allows.
   */
  head = I915_READ_HEAD(ring)  (ring-size - 1);
  head_min = max((int)head - 3 * 4, 0);
  while (head = head_min) {
 cmd = ioread32(ring-virtual_start + head);
 if (cmd == ipehr)
 break;
 head -= 4;
  }

Yeah, this would also fix the immediate bug at hand. But my general
impression of this code is that it's trusting reconstructed state way
too much. Hence why I've gone for the more wordy version of firsting
oring out the right field from the register and then for restricting
with the size of the ring. I'll do a follow-up patch to give the the
same treatment to the semaphore signaller computation.

 + /*
 +  * Be paranoid and presume the hw has gone off into the wild -
 +  * our ring is smaller than what the hardware (and hence
 +  * HEAD_ADDR) allows. Also handles wrap-around.
 +  */
 + head = ring-size - 1;
 +
 + /* This here seems to blow up */
 + cmd = ioread32(ring-virtual_start + head);
 So what does this mean?

Oops, the comment was just a note from my oops decoding. The ioread32
is where we've blown. The comment should be removed since it's not
useful outside of my debug session.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

2014-03-15 Thread Chris Wilson
On Sat, Mar 15, 2014 at 05:31:04PM +0100, Daniel Vetter wrote:
 On Sat, Mar 15, 2014 at 4:46 PM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
  On Sat, Mar 15, 2014 at 12:08:55AM +0100, Daniel Vetter wrote:
  There's an entire pile of issues in here:
 
  - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
offset of the batch buffer when a batch is executed. Semaphores are
always emitted to the main ring, so we always want to look at that.
 
  True, nice catch that would explain a hard hang quite neatly if we tried
  to read from beyond the aperture.
 
  - Mask the obtained HEAD pointer with the actual ring size, which is
much smaller. Together with the above issue this resulted us in
trying to dereference a pointer way outside of the ring mmio
mapping. The resulting invalid access in interrupt context
(hangcheck is executed from timers) lead to a full blown kernel
panic. The fbcon panic handler then tried to frob our driver harder,
resulting in a full machine hang at least on my snb here where I've
stumbled over this.
 
  - Handle ring wrapping correctly and be a bit more explicit about how
many dwords we're scanning. We probably should also scan more than
just 4 ...
 
  You can ignore ring wrapping as valid commands cannot wrap, hence the
  formulation of acthd_min.
 
 Well I've thought that HEAD usually points at the next command. I'm
 not sure about the exact semantics at wrap-around time but since
 implementing it was trivially I've figured it's better to be paranoid.

Nah, it points to ring-size upon parsing a command that exactly ends
upon the wrap around, But your point stands.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx