Re: [Intel-gfx] [PATCH] drm/i915: Clear stop-engine for a pardoned reset

2018-08-15 Thread Chris Wilson
Quoting Mika Kuoppala (2018-08-15 10:08:09)
> Chris Wilson  writes:
> 
> > Quoting Mika Kuoppala (2018-08-15 09:52:18)
> >> Chris Wilson  writes:
> >> 
> >> > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted
> >> > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear
> >> > it on the per-engine exit path as we know that either we skipped the
> >> > reset and so need the cancellation, or the reset was successful and the
> >> > cancellation is a no-op, or there was an error and we will follow up
> >> > with a full-reset or wedging (both of which will stop the engines again
> >> > as required).
> >> >
> >> > Signed-off-by: Chris Wilson 
> >> > Cc: Mika Kuoppala 
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_drv.c |  1 +
> >> >  drivers/gpu/drm/i915/intel_engine_cs.c  | 10 ++
> >> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
> >> >  3 files changed, 12 insertions(+)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> >> > b/drivers/gpu/drm/i915/i915_drv.c
> >> > index 9dce55182c3a..4f2a9c39 100644
> >> > --- a/drivers/gpu/drm/i915/i915_drv.c
> >> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> >> > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs 
> >> > *engine, const char *msg)
> >> >   goto out;
> >> >  
> >> >  out:
> >> > + intel_engine_cancel_stop_cs(engine);
> >> >   i915_gem_reset_finish_engine(engine);
> >> 
> >> Should we just lift the whole stop/start dance into
> >> gem_reset_prepare|finish_engine()s?
> >
> > No, because it is also used by wedging where we do want the asymmetry.
> > Been there, done that, have the gem_eio bruises.
> 
> On wedge the submission path is blocked but yeah,
> gpu can be in any state and enabling the engine at that
> point is asking for trouble.
> 
> Reviewed-by: Mika Kuoppala 

Grabbed, pushed, and scarpered.

Although this should fix the silly STOP_RING hangs that have been
plaguing us for the last few months, there still seems to be more fun to
be had here. /o\
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Clear stop-engine for a pardoned reset

2018-08-15 Thread Mika Kuoppala
Chris Wilson  writes:

> Quoting Mika Kuoppala (2018-08-15 09:52:18)
>> Chris Wilson  writes:
>> 
>> > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted
>> > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear
>> > it on the per-engine exit path as we know that either we skipped the
>> > reset and so need the cancellation, or the reset was successful and the
>> > cancellation is a no-op, or there was an error and we will follow up
>> > with a full-reset or wedging (both of which will stop the engines again
>> > as required).
>> >
>> > Signed-off-by: Chris Wilson 
>> > Cc: Mika Kuoppala 
>> > ---
>> >  drivers/gpu/drm/i915/i915_drv.c |  1 +
>> >  drivers/gpu/drm/i915/intel_engine_cs.c  | 10 ++
>> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>> >  3 files changed, 12 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
>> > b/drivers/gpu/drm/i915/i915_drv.c
>> > index 9dce55182c3a..4f2a9c39 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.c
>> > +++ b/drivers/gpu/drm/i915/i915_drv.c
>> > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs 
>> > *engine, const char *msg)
>> >   goto out;
>> >  
>> >  out:
>> > + intel_engine_cancel_stop_cs(engine);
>> >   i915_gem_reset_finish_engine(engine);
>> 
>> Should we just lift the whole stop/start dance into
>> gem_reset_prepare|finish_engine()s?
>
> No, because it is also used by wedging where we do want the asymmetry.
> Been there, done that, have the gem_eio bruises.

On wedge the submission path is blocked but yeah,
gpu can be in any state and enabling the engine at that
point is asking for trouble.

Reviewed-by: Mika Kuoppala 

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Clear stop-engine for a pardoned reset

2018-08-15 Thread Chris Wilson
Quoting Mika Kuoppala (2018-08-15 09:52:18)
> Chris Wilson  writes:
> 
> > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted
> > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear
> > it on the per-engine exit path as we know that either we skipped the
> > reset and so need the cancellation, or the reset was successful and the
> > cancellation is a no-op, or there was an error and we will follow up
> > with a full-reset or wedging (both of which will stop the engines again
> > as required).
> >
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c |  1 +
> >  drivers/gpu/drm/i915/intel_engine_cs.c  | 10 ++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > b/drivers/gpu/drm/i915/i915_drv.c
> > index 9dce55182c3a..4f2a9c39 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, 
> > const char *msg)
> >   goto out;
> >  
> >  out:
> > + intel_engine_cancel_stop_cs(engine);
> >   i915_gem_reset_finish_engine(engine);
> 
> Should we just lift the whole stop/start dance into
> gem_reset_prepare|finish_engine()s?

No, because it is also used by wedging where we do want the asymmetry.
Been there, done that, have the gem_eio bruises.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Clear stop-engine for a pardoned reset

2018-08-15 Thread Mika Kuoppala
Chris Wilson  writes:

> If we pardon a per-engine reset, we may leave the STOP_RING bit asserted
> in RING_MI_MODE resulting in the engine hanging. Unconditionally clear
> it on the per-engine exit path as we know that either we skipped the
> reset and so need the cancellation, or the reset was successful and the
> cancellation is a no-op, or there was an error and we will follow up
> with a full-reset or wedging (both of which will stop the engines again
> as required).
>
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> ---
>  drivers/gpu/drm/i915/i915_drv.c |  1 +
>  drivers/gpu/drm/i915/intel_engine_cs.c  | 10 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  3 files changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9dce55182c3a..4f2a9c39 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, 
> const char *msg)
>   goto out;
>  
>  out:
> + intel_engine_cancel_stop_cs(engine);
>   i915_gem_reset_finish_engine(engine);

Should we just lift the whole stop/start dance into
gem_reset_prepare|finish_engine()s?

-Mika

>   return ret;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 99d5a24219c1..8628567d8f6e 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -788,6 +788,16 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
>   return err;
>  }
>  
> +void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine)
> +{
> + struct drm_i915_private *dev_priv = engine->i915;
> +
> + GEM_TRACE("%s\n", engine->name);
> +
> + I915_WRITE_FW(RING_MI_MODE(engine->mmio_base),
> +   _MASKED_BIT_DISABLE(STOP_RING));
> +}
> +
>  const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
>  {
>   switch (type) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
> b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 9090885d57de..3f6920dd7880 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -906,6 +906,7 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs 
> *engine);
>  int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
>  
>  int intel_engine_stop_cs(struct intel_engine_cs *engine);
> +void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine);
>  
>  u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
>  u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
> -- 
> 2.18.0
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Clear stop-engine for a pardoned reset

2018-08-14 Thread Chris Wilson
If we pardon a per-engine reset, we may leave the STOP_RING bit asserted
in RING_MI_MODE resulting in the engine hanging. Unconditionally clear
it on the per-engine exit path as we know that either we skipped the
reset and so need the cancellation, or the reset was successful and the
cancellation is a no-op, or there was an error and we will follow up
with a full-reset or wedging (both of which will stop the engines again
as required).

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
---
 drivers/gpu/drm/i915/i915_drv.c |  1 +
 drivers/gpu/drm/i915/intel_engine_cs.c  | 10 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 3 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9dce55182c3a..4f2a9c39 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, 
const char *msg)
goto out;
 
 out:
+   intel_engine_cancel_stop_cs(engine);
i915_gem_reset_finish_engine(engine);
return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 99d5a24219c1..8628567d8f6e 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -788,6 +788,16 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
return err;
 }
 
+void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine)
+{
+   struct drm_i915_private *dev_priv = engine->i915;
+
+   GEM_TRACE("%s\n", engine->name);
+
+   I915_WRITE_FW(RING_MI_MODE(engine->mmio_base),
+ _MASKED_BIT_DISABLE(STOP_RING));
+}
+
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
 {
switch (type) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9090885d57de..3f6920dd7880 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -906,6 +906,7 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs 
*engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
+void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine);
 
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
 u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
-- 
2.18.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx