Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset
Quoting Mika Kuoppala (2019-09-10 10:54:43) > Chris Wilson writes: > > > Quoting Mika Kuoppala (2019-09-10 10:31:05) > >> Chris Wilson writes: > >> > >> > During reset, we try to ensure no forward progress of the CS prior to > >> > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this > >> > register is context saved and do we end up in the odd situation where we > >> > save the STOP_RING bit and so try to stop the engine again immediately > >> > upon resume. This is quite unexpected and causes us to complain about an > >> > early CS completion event! > >> > >> The completion event is a product of resume with a stop set? > > > > A completion event is the product of STOP_RING. Whether it is the > > completion event that we keep failing on... > > > >> If my memory serves me well, we have fought this before. > > > > Exactly. We've reduced the frequency of when we apply the STOP_RING, but > > have not eliminated it. > > > >> But I have still missing pieces. Why would we not want to > >> set this for all contexts primed for execution? In gen8 too. > > > > It's not in the gen8 context, afaict. I searched the context image for an > > LRI with the RING_MI_MODE register: > > https://patchwork.freedesktop.org/patch/329919/?series=66468=1 > > > >> I mean, queuing context with a ring stopped just doesn't > >> sound right on any gen. > > > > We clear the STOP_RING in the register on resume just in case, and that > > is being flagged on Icelake (which I put down to the reset not being very > > thorough!). The remaining question was whether we were restoring it from > > the context image. > > Hmm yeah, I was confused of the sequence of setup. With that cleared > and with the context state being worked on, perhaps we can add > sanity checkers to the queuing path. Yeah, I think there's definitely some fun we can have here. At the very least a check that CTX_RING_START == ring->start would be a good sanitycheck. > Reviewed-by: Mika Kuoppala As always, the only way to be sure if this changes the mtbf is to let is soak. One day I may be able to run my own extended testing on icl! -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset
Chris Wilson writes: > Quoting Mika Kuoppala (2019-09-10 10:31:05) >> Chris Wilson writes: >> >> > During reset, we try to ensure no forward progress of the CS prior to >> > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this >> > register is context saved and do we end up in the odd situation where we >> > save the STOP_RING bit and so try to stop the engine again immediately >> > upon resume. This is quite unexpected and causes us to complain about an >> > early CS completion event! >> >> The completion event is a product of resume with a stop set? > > A completion event is the product of STOP_RING. Whether it is the > completion event that we keep failing on... > >> If my memory serves me well, we have fought this before. > > Exactly. We've reduced the frequency of when we apply the STOP_RING, but > have not eliminated it. > >> But I have still missing pieces. Why would we not want to >> set this for all contexts primed for execution? In gen8 too. > > It's not in the gen8 context, afaict. I searched the context image for an > LRI with the RING_MI_MODE register: > https://patchwork.freedesktop.org/patch/329919/?series=66468=1 > >> I mean, queuing context with a ring stopped just doesn't >> sound right on any gen. > > We clear the STOP_RING in the register on resume just in case, and that > is being flagged on Icelake (which I put down to the reset not being very > thorough!). The remaining question was whether we were restoring it from > the context image. Hmm yeah, I was confused of the sequence of setup. With that cleared and with the context state being worked on, perhaps we can add sanity checkers to the queuing path. Reviewed-by: Mika Kuoppala ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset
Quoting Mika Kuoppala (2019-09-10 10:31:05) > Chris Wilson writes: > > > During reset, we try to ensure no forward progress of the CS prior to > > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this > > register is context saved and do we end up in the odd situation where we > > save the STOP_RING bit and so try to stop the engine again immediately > > upon resume. This is quite unexpected and causes us to complain about an > > early CS completion event! > > The completion event is a product of resume with a stop set? A completion event is the product of STOP_RING. Whether it is the completion event that we keep failing on... > If my memory serves me well, we have fought this before. Exactly. We've reduced the frequency of when we apply the STOP_RING, but have not eliminated it. > But I have still missing pieces. Why would we not want to > set this for all contexts primed for execution? In gen8 too. It's not in the gen8 context, afaict. I searched the context image for an LRI with the RING_MI_MODE register: https://patchwork.freedesktop.org/patch/329919/?series=66468=1 > I mean, queuing context with a ring stopped just doesn't > sound right on any gen. We clear the STOP_RING in the register on resume just in case, and that is being flagged on Icelake (which I put down to the reset not being very thorough!). The remaining question was whether we were restoring it from the context image. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset
Chris Wilson writes: > During reset, we try to ensure no forward progress of the CS prior to > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this > register is context saved and do we end up in the odd situation where we > save the STOP_RING bit and so try to stop the engine again immediately > upon resume. This is quite unexpected and causes us to complain about an > early CS completion event! The completion event is a product of resume with a stop set? If my memory serves me well, we have fought this before. But I have still missing pieces. Why would we not want to set this for all contexts primed for execution? In gen8 too. I mean, queuing context with a ring stopped just doesn't sound right on any gen. -Mika > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111514 > Signed-off-by: Chris Wilson > Cc: Mika Kuoppala > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 10 ++ > drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 2 ++ > 2 files changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c > b/drivers/gpu/drm/i915/gt/intel_lrc.c > index f073aea6a1fe..761f5cbd90d3 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -2368,6 +2368,15 @@ static struct i915_request *active_request(struct > i915_request *rq) > return active; > } > > +static void __execlists_reset_reg_state(const struct intel_context *ce, > + const struct intel_engine_cs *engine) > +{ > + u32 *regs = ce->lrc_reg_state; > + > + if (INTEL_GEN(engine->i915) >= 9) > + regs[GEN9_CTX_RING_MI_MODE + 1] |= > _MASKED_BIT_DISABLE(STOP_RING); > +} > + > static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) > { > struct intel_engine_execlists * const execlists = >execlists; > @@ -2455,6 +2464,7 @@ static void __execlists_reset(struct intel_engine_cs > *engine, bool stalled) > GEM_TRACE("%s replay {head:%04x, tail:%04x\n", > engine->name, ce->ring->head, ce->ring->tail); > intel_ring_update_space(ce->ring); > + __execlists_reset_reg_state(ce, engine); > __execlists_update_reg_state(ce, engine); > mutex_release(>pin_mutex.dep_map, 0, _THIS_IP_); > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h > b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h > index 68caf8541866..7e773e74a3fe 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h > +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h > @@ -39,6 +39,8 @@ > #define CTX_R_PWR_CLK_STATE 0x42 > #define CTX_END 0x44 > > +#define GEN9_CTX_RING_MI_MODE0x54 > + > /* GEN12+ Reg State Context */ > #define GEN12_CTX_BB_PER_CTX_PTR 0x12 > #define GEN12_CTX_LRI_HEADER_3 0x41 > -- > 2.23.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx