Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged

2018-02-08 Thread Chris Wilson
Quoting Mika Kuoppala (2018-02-08 11:35:09)
> Chris Wilson  writes:
> 
> > Reduce the window of opportunity for set-wedged being called
> > concurrently with reset (after i915_reset() has performed the
> > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> > complete the inflight requests. When i915_reset() is being blocked on a
> > request, such completion may allow it to start and beginning resetting
> > the GPU before i915_gem_set_wedged() has finished (and so before
> > set-wedge will have marked the device as wedged). As such,
> > i915_gem_init_hw() may see a wedged device even from inside
> > i915_reset().
> >
> > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> > Cc: Joonas Lahtinen 
> > Cc: Tvrtko Ursulin 
> 
> Reviewed-by: Mika Kuoppala 

Thank you both kindly for the review,
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged

2018-02-08 Thread Mika Kuoppala
Chris Wilson  writes:

> Reduce the window of opportunity for set-wedged being called
> concurrently with reset (after i915_reset() has performed the
> i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> complete the inflight requests. When i915_reset() is being blocked on a
> request, such completion may allow it to start and beginning resetting
> the GPU before i915_gem_set_wedged() has finished (and so before
> set-wedge will have marked the device as wedged). As such,
> i915_gem_init_hw() may see a wedged device even from inside
> i915_reset().
>
> References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 

Reviewed-by: Mika Kuoppala 

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1b80cd52f9e..06f0456699af 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>   intel_engine_dump(engine, &p, "%s\n", engine->name);
>   }
>  
> + set_bit(I915_WEDGED, &i915->gpu_error.flags);
> + smp_mb__after_atomic();
> +
>   /*
>* First, stop submission to hw, but do not yet complete requests by
>* rolling the global seqno forward (since this would complete requests
> @@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>   for_each_engine(engine, i915, id) {
>   unsigned long flags;
>  
> - /* Mark all pending requests as complete so that any concurrent
> + /*
> +  * Mark all pending requests as complete so that any concurrent
>* (lockless) lookup doesn't try and wait upon the request as we
>* reset it.
>*/
> @@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>   spin_unlock_irqrestore(&engine->timeline->lock, flags);
>   }
>  
> - set_bit(I915_WEDGED, &i915->gpu_error.flags);
>   wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
> -- 
> 2.16.1
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged

2018-02-08 Thread Joonas Lahtinen
Quoting Chris Wilson (2018-02-08 10:39:05)
> Quoting Chris Wilson (2018-02-07 15:13:50)
> > Reduce the window of opportunity for set-wedged being called
> > concurrently with reset (after i915_reset() has performed the
> > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> > complete the inflight requests. When i915_reset() is being blocked on a
> > request, such completion may allow it to start and beginning resetting
> > the GPU before i915_gem_set_wedged() has finished (and so before
> > set-wedge will have marked the device as wedged). As such,
> > i915_gem_init_hw() may see a wedged device even from inside
> > i915_reset().
> 
> So I'm 99% certain this is the problem on blb/pnv. As we break the
> modeset deadlock using set-wedged from a timer, the reset springs into
> action on the other cpu and races with set_bit(I915_WEDGED). Flaging
> I915_WEDGED first will force i915_reset() to serialise via
> i915_gem_unset_wedged(). (Well that's the plan at least.)
>  
> > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> > Cc: Joonas Lahtinen 
> > Cc: Tvrtko Ursulin 

Reviewed-by: Joonas Lahtinen 

Regards, Joonas
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged

2018-02-08 Thread Chris Wilson
Quoting Chris Wilson (2018-02-07 15:13:50)
> Reduce the window of opportunity for set-wedged being called
> concurrently with reset (after i915_reset() has performed the
> i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> complete the inflight requests. When i915_reset() is being blocked on a
> request, such completion may allow it to start and beginning resetting
> the GPU before i915_gem_set_wedged() has finished (and so before
> set-wedge will have marked the device as wedged). As such,
> i915_gem_init_hw() may see a wedged device even from inside
> i915_reset().

So I'm 99% certain this is the problem on blb/pnv. As we break the
modeset deadlock using set-wedged from a timer, the reset springs into
action on the other cpu and races with set_bit(I915_WEDGED). Flaging
I915_WEDGED first will force i915_reset() to serialise via
i915_gem_unset_wedged(). (Well that's the plan at least.)
 
> References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1b80cd52f9e..06f0456699af 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> intel_engine_dump(engine, &p, "%s\n", engine->name);
> }
>  
> +   set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +   smp_mb__after_atomic();
> +
> /*
>  * First, stop submission to hw, but do not yet complete requests by
>  * rolling the global seqno forward (since this would complete 
> requests
> @@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> for_each_engine(engine, i915, id) {
> unsigned long flags;
>  
> -   /* Mark all pending requests as complete so that any 
> concurrent
> +   /*
> +* Mark all pending requests as complete so that any 
> concurrent
>  * (lockless) lookup doesn't try and wait upon the request as 
> we
>  * reset it.
>  */
> @@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> spin_unlock_irqrestore(&engine->timeline->lock, flags);
> }
>  
> -   set_bit(I915_WEDGED, &i915->gpu_error.flags);
> wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
> -- 
> 2.16.1
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx