Re: [Intel-gfx] [PATCH] drm/i915: Clear lost context-switch interrupts across reset
Quoting Dong, Chuanxiao (2017-08-09 01:26:00) > > -Original Message- > > From: Thierry, Michel > > Sent: Tuesday, August 8, 2017 1:40 AM > > To: Chris Wilson; intel-gfx@lists.freedesktop.org > > Cc: Dong, Chuanxiao ; Ursulin, Tvrtko > > ; Winiarski, Michal > > Subject: Re: [PATCH] drm/i915: Clear lost context-switch interrupts across > > reset > > > > On 8/7/2017 5:19 AM, Chris Wilson wrote: > > > During a global reset, we disable the irq. As we disable the irq, the > > > hardware may be raising a GT interrupt that we then ignore, leaving it > > > pending in the GTIIR. After the reset, we then re-enable the irq, > > > triggering the pending interrupt. However, that interrupt was for the > > > stale state from before the reset, and the contents of the CSB buffer > > > are now invalid. > > > > > > Reported-by: "Dong, Chuanxiao" > > > Signed-off-by: Chris Wilson > > > Cc: "Dong, Chuanxiao" > > > Cc: Tvrtko Ursulin > > > Cc: Michal Winiarski > > > Cc: Michel Thierry > > > --- > > > drivers/gpu/drm/i915/intel_lrc.c | 17 - > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c > > > b/drivers/gpu/drm/i915/intel_lrc.c > > > index b0738d2b2a7f..bc61948e2601 100644 > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > @@ -1221,6 +1221,14 @@ static int intel_init_workaround_bb(struct > > intel_engine_cs *engine) > > > return ret; > > > } > > > > > > +static u8 gtiir[] = { > > > + [RCS] = 0, > > > + [BCS] = 0, > > > + [VCS] = 1, > > > + [VCS2] = 1, > > > + [VECS] = 3, > > > +}; > > > + > > > static int gen8_init_common_ring(struct intel_engine_cs *engine) > > > { > > > struct drm_i915_private *dev_priv = engine->i915; @@ -1245,9 > > > +1253,16 @@ static int gen8_init_common_ring(struct intel_engine_cs > > > *engine) > > > > > > DRM_DEBUG_DRIVER("Execlists enabled for %s\n", engine->name); > > > > > > - /* After a GPU reset, we may have requests to replay */ > > > + GEM_BUG_ON(engine->id >= ARRAY_SIZE(gtiir)); > > > + > > > + /* Clear any pending interrupt state */ > > > + I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]), > > > + GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift); > > > + I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]), > > > + GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift); > > > > Clear twice? Or the second was supposed to be user_interrupt? I never know which of the IIR are double buffered, so in case they decide to double buffer GTIIR, I cleared it twice. We do want to service the user_interrupt even after the hang in case we don't notify it during reset processing. Might be worth kicking the engine just in case, but we have contingency plans. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Clear lost context-switch interrupts across reset
> -Original Message- > From: Thierry, Michel > Sent: Tuesday, August 8, 2017 1:40 AM > To: Chris Wilson; intel-gfx@lists.freedesktop.org > Cc: Dong, Chuanxiao ; Ursulin, Tvrtko > ; Winiarski, Michal > Subject: Re: [PATCH] drm/i915: Clear lost context-switch interrupts across > reset > > On 8/7/2017 5:19 AM, Chris Wilson wrote: > > During a global reset, we disable the irq. As we disable the irq, the > > hardware may be raising a GT interrupt that we then ignore, leaving it > > pending in the GTIIR. After the reset, we then re-enable the irq, > > triggering the pending interrupt. However, that interrupt was for the > > stale state from before the reset, and the contents of the CSB buffer > > are now invalid. > > > > Reported-by: "Dong, Chuanxiao" > > Signed-off-by: Chris Wilson > > Cc: "Dong, Chuanxiao" > > Cc: Tvrtko Ursulin > > Cc: Michal Winiarski > > Cc: Michel Thierry > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 17 - > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c > > b/drivers/gpu/drm/i915/intel_lrc.c > > index b0738d2b2a7f..bc61948e2601 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -1221,6 +1221,14 @@ static int intel_init_workaround_bb(struct > intel_engine_cs *engine) > > return ret; > > } > > > > +static u8 gtiir[] = { > > + [RCS] = 0, > > + [BCS] = 0, > > + [VCS] = 1, > > + [VCS2] = 1, > > + [VECS] = 3, > > +}; > > + > > static int gen8_init_common_ring(struct intel_engine_cs *engine) > > { > > struct drm_i915_private *dev_priv = engine->i915; @@ -1245,9 > > +1253,16 @@ static int gen8_init_common_ring(struct intel_engine_cs > > *engine) > > > > DRM_DEBUG_DRIVER("Execlists enabled for %s\n", engine->name); > > > > - /* After a GPU reset, we may have requests to replay */ > > + GEM_BUG_ON(engine->id >= ARRAY_SIZE(gtiir)); > > + > > + /* Clear any pending interrupt state */ > > + I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]), > > + GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift); > > + I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]), > > + GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift); > > Clear twice? Or the second was supposed to be user_interrupt? To resolve the CSB tail/head trouble, clearing context switching pending interrupt only should be enough, but seems it is not necessary to keep the pending user interrupt as well so we can clear user interrupt as well. Chris, what do you think? And thank you for bringing the fix for this issue. :) Thanks Chuanxiao > > > clear_bit(ENGINE_IRQ_EXECLIST, >irq_posted); > > > > + /* After a GPU reset, we may have requests to replay */ > > submit = false; > > for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) { > > if (!port_isset([n])) > > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Clear lost context-switch interrupts across reset
On 8/7/2017 5:19 AM, Chris Wilson wrote: During a global reset, we disable the irq. As we disable the irq, the hardware may be raising a GT interrupt that we then ignore, leaving it pending in the GTIIR. After the reset, we then re-enable the irq, triggering the pending interrupt. However, that interrupt was for the stale state from before the reset, and the contents of the CSB buffer are now invalid. Reported-by: "Dong, Chuanxiao"Signed-off-by: Chris Wilson Cc: "Dong, Chuanxiao" Cc: Tvrtko Ursulin Cc: Michal Winiarski Cc: Michel Thierry --- drivers/gpu/drm/i915/intel_lrc.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index b0738d2b2a7f..bc61948e2601 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1221,6 +1221,14 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine) return ret; } +static u8 gtiir[] = { + [RCS] = 0, + [BCS] = 0, + [VCS] = 1, + [VCS2] = 1, + [VECS] = 3, +}; + static int gen8_init_common_ring(struct intel_engine_cs *engine) { struct drm_i915_private *dev_priv = engine->i915; @@ -1245,9 +1253,16 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine) DRM_DEBUG_DRIVER("Execlists enabled for %s\n", engine->name); - /* After a GPU reset, we may have requests to replay */ + GEM_BUG_ON(engine->id >= ARRAY_SIZE(gtiir)); + + /* Clear any pending interrupt state */ + I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]), + GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift); + I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]), + GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift); Clear twice? Or the second was supposed to be user_interrupt? clear_bit(ENGINE_IRQ_EXECLIST, >irq_posted); + /* After a GPU reset, we may have requests to replay */ submit = false; for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) { if (!port_isset([n])) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx