Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
Quoting Mika Kuoppala (2018-02-08 11:35:09) > Chris Wilson writes: > > > Reduce the window of opportunity for set-wedged being called > > concurrently with reset (after i915_reset() has performed the > > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we > > complete the inflight requests. When i915_reset() is being blocked on a > > request, such completion may allow it to start and beginning resetting > > the GPU before i915_gem_set_wedged() has finished (and so before > > set-wedge will have marked the device as wedged). As such, > > i915_gem_init_hw() may see a wedged device even from inside > > i915_reset(). > > > > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset") > > Signed-off-by: Chris Wilson > > Cc: Mika Kuoppala > > Cc: Joonas Lahtinen > > Cc: Tvrtko Ursulin > > Reviewed-by: Mika Kuoppala Thank you both kindly for the review, -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
Chris Wilson writes: > Reduce the window of opportunity for set-wedged being called > concurrently with reset (after i915_reset() has performed the > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we > complete the inflight requests. When i915_reset() is being blocked on a > request, such completion may allow it to start and beginning resetting > the GPU before i915_gem_set_wedged() has finished (and so before > set-wedge will have marked the device as wedged). As such, > i915_gem_init_hw() may see a wedged device even from inside > i915_reset(). > > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset") > Signed-off-by: Chris Wilson > Cc: Mika Kuoppala > Cc: Joonas Lahtinen > Cc: Tvrtko Ursulin Reviewed-by: Mika Kuoppala > --- > drivers/gpu/drm/i915/i915_gem.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index c1b80cd52f9e..06f0456699af 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) > intel_engine_dump(engine, &p, "%s\n", engine->name); > } > > + set_bit(I915_WEDGED, &i915->gpu_error.flags); > + smp_mb__after_atomic(); > + > /* >* First, stop submission to hw, but do not yet complete requests by >* rolling the global seqno forward (since this would complete requests > @@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) > for_each_engine(engine, i915, id) { > unsigned long flags; > > - /* Mark all pending requests as complete so that any concurrent > + /* > + * Mark all pending requests as complete so that any concurrent >* (lockless) lookup doesn't try and wait upon the request as we >* reset it. >*/ > @@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) > spin_unlock_irqrestore(&engine->timeline->lock, flags); > } > > - set_bit(I915_WEDGED, &i915->gpu_error.flags); > wake_up_all(&i915->gpu_error.reset_queue); > } > > -- > 2.16.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
Quoting Chris Wilson (2018-02-08 10:39:05) > Quoting Chris Wilson (2018-02-07 15:13:50) > > Reduce the window of opportunity for set-wedged being called > > concurrently with reset (after i915_reset() has performed the > > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we > > complete the inflight requests. When i915_reset() is being blocked on a > > request, such completion may allow it to start and beginning resetting > > the GPU before i915_gem_set_wedged() has finished (and so before > > set-wedge will have marked the device as wedged). As such, > > i915_gem_init_hw() may see a wedged device even from inside > > i915_reset(). > > So I'm 99% certain this is the problem on blb/pnv. As we break the > modeset deadlock using set-wedged from a timer, the reset springs into > action on the other cpu and races with set_bit(I915_WEDGED). Flaging > I915_WEDGED first will force i915_reset() to serialise via > i915_gem_unset_wedged(). (Well that's the plan at least.) > > > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset") > > Signed-off-by: Chris Wilson > > Cc: Mika Kuoppala > > Cc: Joonas Lahtinen > > Cc: Tvrtko Ursulin Reviewed-by: Joonas Lahtinen Regards, Joonas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
Quoting Chris Wilson (2018-02-07 15:13:50) > Reduce the window of opportunity for set-wedged being called > concurrently with reset (after i915_reset() has performed the > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we > complete the inflight requests. When i915_reset() is being blocked on a > request, such completion may allow it to start and beginning resetting > the GPU before i915_gem_set_wedged() has finished (and so before > set-wedge will have marked the device as wedged). As such, > i915_gem_init_hw() may see a wedged device even from inside > i915_reset(). So I'm 99% certain this is the problem on blb/pnv. As we break the modeset deadlock using set-wedged from a timer, the reset springs into action on the other cpu and races with set_bit(I915_WEDGED). Flaging I915_WEDGED first will force i915_reset() to serialise via i915_gem_unset_wedged(). (Well that's the plan at least.) > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset") > Signed-off-by: Chris Wilson > Cc: Mika Kuoppala > Cc: Joonas Lahtinen > Cc: Tvrtko Ursulin > --- > drivers/gpu/drm/i915/i915_gem.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index c1b80cd52f9e..06f0456699af 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) > intel_engine_dump(engine, &p, "%s\n", engine->name); > } > > + set_bit(I915_WEDGED, &i915->gpu_error.flags); > + smp_mb__after_atomic(); > + > /* > * First, stop submission to hw, but do not yet complete requests by > * rolling the global seqno forward (since this would complete > requests > @@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) > for_each_engine(engine, i915, id) { > unsigned long flags; > > - /* Mark all pending requests as complete so that any > concurrent > + /* > +* Mark all pending requests as complete so that any > concurrent > * (lockless) lookup doesn't try and wait upon the request as > we > * reset it. > */ > @@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) > spin_unlock_irqrestore(&engine->timeline->lock, flags); > } > > - set_bit(I915_WEDGED, &i915->gpu_error.flags); > wake_up_all(&i915->gpu_error.reset_queue); > } > > -- > 2.16.1 > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx