Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset

2019-09-10 Thread Chris Wilson
Quoting Mika Kuoppala (2019-09-10 10:54:43)
> Chris Wilson  writes:
> 
> > Quoting Mika Kuoppala (2019-09-10 10:31:05)
> >> Chris Wilson  writes:
> >> 
> >> > During reset, we try to ensure no forward progress of the CS prior to
> >> > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
> >> > register is context saved and do we end up in the odd situation where we
> >> > save the STOP_RING bit and so try to stop the engine again immediately
> >> > upon resume. This is quite unexpected and causes us to complain about an
> >> > early CS completion event!
> >> 
> >> The completion event is a product of resume with a stop set?
> >
> > A completion event is the product of STOP_RING. Whether it is the
> > completion event that we keep failing on...
> >  
> >> If my memory serves me well, we have fought this before.
> >
> > Exactly. We've reduced the frequency of when we apply the STOP_RING, but
> > have not eliminated it.
> >  
> >> But I have still missing pieces. Why would we not want to
> >> set this for all contexts primed for execution? In gen8 too.
> >
> > It's not in the gen8 context, afaict. I searched the context image for an
> > LRI with the RING_MI_MODE register:
> > https://patchwork.freedesktop.org/patch/329919/?series=66468=1
> >  
> >> I mean, queuing context with a ring stopped just doesn't
> >> sound right on any gen.
> >
> > We clear the STOP_RING in the register on resume just in case, and that
> > is being flagged on Icelake (which I put down to the reset not being very
> > thorough!). The remaining question was whether we were restoring it from
> > the context image.
> 
> Hmm yeah, I was confused of the sequence of setup. With that cleared
> and with the context state being worked on, perhaps we can add
> sanity checkers to the queuing path.

Yeah, I think there's definitely some fun we can have here. At the very
least a check that CTX_RING_START == ring->start would be a good
sanitycheck.
 
> Reviewed-by: Mika Kuoppala 

As always, the only way to be sure if this changes the mtbf is to let is
soak. One day I may be able to run my own extended testing on icl!
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset

2019-09-10 Thread Mika Kuoppala
Chris Wilson  writes:

> Quoting Mika Kuoppala (2019-09-10 10:31:05)
>> Chris Wilson  writes:
>> 
>> > During reset, we try to ensure no forward progress of the CS prior to
>> > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
>> > register is context saved and do we end up in the odd situation where we
>> > save the STOP_RING bit and so try to stop the engine again immediately
>> > upon resume. This is quite unexpected and causes us to complain about an
>> > early CS completion event!
>> 
>> The completion event is a product of resume with a stop set?
>
> A completion event is the product of STOP_RING. Whether it is the
> completion event that we keep failing on...
>  
>> If my memory serves me well, we have fought this before.
>
> Exactly. We've reduced the frequency of when we apply the STOP_RING, but
> have not eliminated it.
>  
>> But I have still missing pieces. Why would we not want to
>> set this for all contexts primed for execution? In gen8 too.
>
> It's not in the gen8 context, afaict. I searched the context image for an
> LRI with the RING_MI_MODE register:
> https://patchwork.freedesktop.org/patch/329919/?series=66468=1
>  
>> I mean, queuing context with a ring stopped just doesn't
>> sound right on any gen.
>
> We clear the STOP_RING in the register on resume just in case, and that
> is being flagged on Icelake (which I put down to the reset not being very
> thorough!). The remaining question was whether we were restoring it from
> the context image.

Hmm yeah, I was confused of the sequence of setup. With that cleared
and with the context state being worked on, perhaps we can add
sanity checkers to the queuing path.

Reviewed-by: Mika Kuoppala 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset

2019-09-10 Thread Chris Wilson
Quoting Mika Kuoppala (2019-09-10 10:31:05)
> Chris Wilson  writes:
> 
> > During reset, we try to ensure no forward progress of the CS prior to
> > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
> > register is context saved and do we end up in the odd situation where we
> > save the STOP_RING bit and so try to stop the engine again immediately
> > upon resume. This is quite unexpected and causes us to complain about an
> > early CS completion event!
> 
> The completion event is a product of resume with a stop set?

A completion event is the product of STOP_RING. Whether it is the
completion event that we keep failing on...
 
> If my memory serves me well, we have fought this before.

Exactly. We've reduced the frequency of when we apply the STOP_RING, but
have not eliminated it.
 
> But I have still missing pieces. Why would we not want to
> set this for all contexts primed for execution? In gen8 too.

It's not in the gen8 context, afaict. I searched the context image for an
LRI with the RING_MI_MODE register:
https://patchwork.freedesktop.org/patch/329919/?series=66468=1
 
> I mean, queuing context with a ring stopped just doesn't
> sound right on any gen.

We clear the STOP_RING in the register on resume just in case, and that
is being flagged on Icelake (which I put down to the reset not being very
thorough!). The remaining question was whether we were restoring it from
the context image.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset

2019-09-10 Thread Mika Kuoppala
Chris Wilson  writes:

> During reset, we try to ensure no forward progress of the CS prior to
> the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
> register is context saved and do we end up in the odd situation where we
> save the STOP_RING bit and so try to stop the engine again immediately
> upon resume. This is quite unexpected and causes us to complain about an
> early CS completion event!

The completion event is a product of resume with a stop set?

If my memory serves me well, we have fought this before.

But I have still missing pieces. Why would we not want to
set this for all contexts primed for execution? In gen8 too.

I mean, queuing context with a ring stopped just doesn't
sound right on any gen.

-Mika

>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111514
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 10 ++
>  drivers/gpu/drm/i915/gt/intel_lrc_reg.h |  2 ++
>  2 files changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index f073aea6a1fe..761f5cbd90d3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2368,6 +2368,15 @@ static struct i915_request *active_request(struct 
> i915_request *rq)
>   return active;
>  }
>  
> +static void __execlists_reset_reg_state(const struct intel_context *ce,
> + const struct intel_engine_cs *engine)
> +{
> + u32 *regs = ce->lrc_reg_state;
> +
> + if (INTEL_GEN(engine->i915) >= 9)
> + regs[GEN9_CTX_RING_MI_MODE + 1] |= 
> _MASKED_BIT_DISABLE(STOP_RING);
> +}
> +
>  static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
>  {
>   struct intel_engine_execlists * const execlists = >execlists;
> @@ -2455,6 +2464,7 @@ static void __execlists_reset(struct intel_engine_cs 
> *engine, bool stalled)
>   GEM_TRACE("%s replay {head:%04x, tail:%04x\n",
> engine->name, ce->ring->head, ce->ring->tail);
>   intel_ring_update_space(ce->ring);
> + __execlists_reset_reg_state(ce, engine);
>   __execlists_update_reg_state(ce, engine);
>   mutex_release(>pin_mutex.dep_map, 0, _THIS_IP_);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h 
> b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 68caf8541866..7e773e74a3fe 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -39,6 +39,8 @@
>  #define CTX_R_PWR_CLK_STATE  0x42
>  #define CTX_END  0x44
>  
> +#define GEN9_CTX_RING_MI_MODE0x54
> +
>  /* GEN12+ Reg State Context */
>  #define GEN12_CTX_BB_PER_CTX_PTR 0x12
>  #define GEN12_CTX_LRI_HEADER_3   0x41
> -- 
> 2.23.0
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx