Re: [Intel-gfx] [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock

2017-07-20 Thread Daniel Vetter
On Thu, Jul 20, 2017 at 10:16 PM, Chris Wilson  wrote:
> Quoting Daniel Vetter (2017-07-20 21:04:50)
>> On Thu, Jul 20, 2017 at 9:47 PM, Chris Wilson  
>> wrote:
>> >> diff --git a/drivers/gpu/drm/i915/intel_display.c 
>> >> b/drivers/gpu/drm/i915/intel_display.c
>> >> index 02b1f4966049..995522e40ec1 100644
>> >> --- a/drivers/gpu/drm/i915/intel_display.c
>> >> +++ b/drivers/gpu/drm/i915/intel_display.c
>> >> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private 
>> >> *dev_priv)
>> >> !gpu_reset_clobbers_display(dev_priv))
>> >> return;
>> >>
>> >> +   /* We have a modeset vs reset deadlock, defensively unbreak it.
>> >> +*
>> >> +* FIXME: We can do a _lot_ better, this is just a first 
>> >> iteration.*/
>> >> +   i915_gem_set_wedged(dev_priv);
>> >> +   DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending 
>> >> modeset updates\n");
>> >
>> > I meant this a dev_err(). It has user visible impact and user data loss.
>>
>> stop_rings is gone so I can't differentiate, and DRM_ERROR breaks CI.
>> Which after your timeout is the only point of this patch really.
>
> So drop this patch. If the only reason is to sweep the bug under the
> carpet, don't. I thought this patch was to move the wedge to where you
> planned to replace it with the refined approach to abort the modeset.

It fixes a regression we have in CI, because the dmesg-warn can hide
other stuff. If you want, resurrect some better way to handle that,
meanwhile this here gets the job done. Since the latter patches don't
work I'm proposing just this one here for merging.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock

2017-07-20 Thread Chris Wilson
Quoting Daniel Vetter (2017-07-20 21:04:50)
> On Thu, Jul 20, 2017 at 9:47 PM, Chris Wilson  
> wrote:
> >> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> >> b/drivers/gpu/drm/i915/intel_display.c
> >> index 02b1f4966049..995522e40ec1 100644
> >> --- a/drivers/gpu/drm/i915/intel_display.c
> >> +++ b/drivers/gpu/drm/i915/intel_display.c
> >> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private 
> >> *dev_priv)
> >> !gpu_reset_clobbers_display(dev_priv))
> >> return;
> >>
> >> +   /* We have a modeset vs reset deadlock, defensively unbreak it.
> >> +*
> >> +* FIXME: We can do a _lot_ better, this is just a first 
> >> iteration.*/
> >> +   i915_gem_set_wedged(dev_priv);
> >> +   DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending 
> >> modeset updates\n");
> >
> > I meant this a dev_err(). It has user visible impact and user data loss.
> 
> stop_rings is gone so I can't differentiate, and DRM_ERROR breaks CI.
> Which after your timeout is the only point of this patch really.

So drop this patch. If the only reason is to sweep the bug under the
carpet, don't. I thought this patch was to move the wedge to where you
planned to replace it with the refined approach to abort the modeset.   

> guess I could throw a pr_notice or so in there, but we already do
> that. This was all removed in
> 
> commit 7b4d3a16dd97be0ebc793ea046b9af9d5c9b1b1a
> Author: Chris Wilson 
> Date:   Mon Jul 4 08:08:37 2016 +0100
> 
> drm/i915: Remove stop-rings debugfs interface
> 
> Should I explain this better in the commit message? I tried already ...

No idea what you mean. If you mean you want to know whether this error
was simulated or not, look in the error state. But the point of this
series is to avoid the dev_err() in the first place, in which case why
do we care whether it was simulated or not, if we hit the err it is a
bug and CI should be flagging it.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock

2017-07-20 Thread Daniel Vetter
On Thu, Jul 20, 2017 at 9:47 PM, Chris Wilson  wrote:
>> diff --git a/drivers/gpu/drm/i915/intel_display.c 
>> b/drivers/gpu/drm/i915/intel_display.c
>> index 02b1f4966049..995522e40ec1 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private 
>> *dev_priv)
>> !gpu_reset_clobbers_display(dev_priv))
>> return;
>>
>> +   /* We have a modeset vs reset deadlock, defensively unbreak it.
>> +*
>> +* FIXME: We can do a _lot_ better, this is just a first iteration.*/
>> +   i915_gem_set_wedged(dev_priv);
>> +   DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending 
>> modeset updates\n");
>
> I meant this a dev_err(). It has user visible impact and user data loss.

stop_rings is gone so I can't differentiate, and DRM_ERROR breaks CI.
Which after your timeout is the only point of this patch really. I
guess I could throw a pr_notice or so in there, but we already do
that. This was all removed in

commit 7b4d3a16dd97be0ebc793ea046b9af9d5c9b1b1a
Author: Chris Wilson 
Date:   Mon Jul 4 08:08:37 2016 +0100

drm/i915: Remove stop-rings debugfs interface

Should I explain this better in the commit message? I tried already ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/7] drm/i915: Avoid the gpu reset vs. modeset deadlock

2017-07-20 Thread Chris Wilson
Quoting Daniel Vetter (2017-07-20 18:57:48)
> ... using the biggest hammer we have. This is essentially a weaponized
> version of the timeout-based wedging Chris added in
> 
> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
> Author: Chris Wilson 
> Date:   Thu Jun 22 11:56:25 2017 +0100
> 
> drm/i915: Break modeset deadlocks on reset
> 
> Because defense-in-depth is good it's good to still have both. Also
> note that with the locking change we can now restrict this a lot (old
> gpus and special testing only), so this doesn't kill the TDR benefits
> on at least anything remotely modern.
> 
> And futuremore with a few tricks it should be possible to make a much
> more educated guess about whether an atomic commit is stuck waiting on
> the gpu (atomic_t counting the pending i915_sw_fence used by the
> atomic modeset code should do it), so we can improve this.
> 
> But for now just start with something that is guaranteed to recover
> faster, for much better CI througput.
> 
> This defacto reverts TDR on these platforms, but there's not really a
> single commit to specify as the sole offender.
> 
> v2: Add a debug message to explain what's going on. We can't DRM_ERROR
> because that spams CI. And the timeout based fallback still prints a
> DRM_ERROR, in case something goes wrong.
> 
> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for 
> request completion")
> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the 
> waiter")
> Cc: Chris Wilson 
> Cc: Mika Kuoppala 
> Cc: Joonas Lahtinen 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/intel_display.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index 02b1f4966049..995522e40ec1 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private 
> *dev_priv)
> !gpu_reset_clobbers_display(dev_priv))
> return;
>  
> +   /* We have a modeset vs reset deadlock, defensively unbreak it.
> +*
> +* FIXME: We can do a _lot_ better, this is just a first iteration.*/
> +   i915_gem_set_wedged(dev_priv);
> +   DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset 
> updates\n");

I meant this a dev_err(). It has user visible impact and user data loss.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx