Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-04-01 Thread Tomi Sarvela
Hi Lukas,

Ran this patch through the farm, and it seems that this patch might've helped 
HSW, maybe even BSW.

ILK, IVB and SNB are still hanging hard to the same igt-test,
kms_pipe_crc_basic@suspend-read-crc-pipe-?

I'll try to make a kernel with the changes you proposed, but as I'm not 
familiar with the driver innards, it might take a while.

Tomi


On Thursday 31 March 2016 22:35:17 Lukas Wunner wrote:
> Hi Tomi,
> 
> On Thu, Mar 31, 2016 at 10:21:16AM +0300, Tomi Sarvela wrote:
> > The problem with the results in your link is that there is no HSW, ILK,
> > IVB
> > or SNB results. This might give the impression that everything is well.
> > 
> > Most damning is lack of HSW-gt2 and SNB-dellxps: those machines hang on to
> > APC, and have run quite stably for every Patchwork run. The case isn't
> > strong enough yet that series should fail if either of those won't run,
> > but it might be so in future.
> 
> So my patch seeking to fix the hangs has passed Romania CI with "success":
> https://patchwork.freedesktop.org/series/5125/
> 
> However I don't see HSW-gt2 and SNB-dellxps in their hardware lineup.
> And I would still like to know what the actual cause of the hangs is
> since they do not occur on my IVB laptop.
> 
> If you get the chance maybe you can repeat the test and include a
> "dump_stack();" at the beginning of intel_fbdev_output_poll_changed()
> and intel_fbdev_restore_mode(). This should show in the logs which of
> the two functions is called during suspend/resume and from where it's
> called. My guess is that this particular hardware causes a hotplug
> signal to be generated upon waking up. If the hangs do not stop with
> this patch, booting with "no_console_suspend" should at least show
> what's going on.
> 
> Thank you & best regards,
> 
> Lukas

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-31 Thread Lukas Wunner
Hi Tomi,

On Thu, Mar 31, 2016 at 10:21:16AM +0300, Tomi Sarvela wrote:
> The problem with the results in your link is that there is no HSW, ILK, IVB
> or SNB results. This might give the impression that everything is well.
> 
> Most damning is lack of HSW-gt2 and SNB-dellxps: those machines hang on to
> APC, and have run quite stably for every Patchwork run. The case isn't strong
> enough yet that series should fail if either of those won't run, but it might
> be so in future.

So my patch seeking to fix the hangs has passed Romania CI with "success":
https://patchwork.freedesktop.org/series/5125/

However I don't see HSW-gt2 and SNB-dellxps in their hardware lineup.
And I would still like to know what the actual cause of the hangs is
since they do not occur on my IVB laptop.

If you get the chance maybe you can repeat the test and include a
"dump_stack();" at the beginning of intel_fbdev_output_poll_changed()
and intel_fbdev_restore_mode(). This should show in the logs which of
the two functions is called during suspend/resume and from where it's
called. My guess is that this particular hardware causes a hotplug
signal to be generated upon waking up. If the hangs do not stop with
this patch, booting with "no_console_suspend" should at least show
what's going on.

Thank you & best regards,

Lukas
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-31 Thread Lukas Wunner
Hi Gabriel,

On Thu, Mar 31, 2016 at 10:42:37AM +0300, Gabriel Feceoru wrote:
> On 31.03.2016 00:35, Lukas Wunner wrote:
> >On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote:
> >>This commit causes a hang while running kms suspend tests
> >>(kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.
> 
> Tomi already replied, meantime I also looked at the results.
> The current regression is for ILK/SNB/IVB only (v1 seemed to affect more
> platforms).
> Unfortunately these machines were not available when v2 was tested, so this
> couldn't be detected.

I dev on an IVB machine and cannot reproduce this. Suspend works fine.

All the patch does is call async_synchronize_full()
(1) when a hotplug event arrives or
(2) when the last DRM client closes the connection.
Either of these two things seems to be happening on your test machines
when running the suspend test.

The PM core suspends and resumes individual devices asynchronously and
calls async_synchronize_full() in a couple of places. If a device's PM
callbacks also call async_synchronize_full(), the machine deadlocks.

It is unnecessary that we call async_synchronize_full(), we only need
to synchronize up to a specific cookie (which represents initialization
of the fbdev). So I've just posted a patch to replace the calls to
async_synchronize_full() with async_synchronize_cookie(). This should
make things less fragile and hopefully also solve the hangs you're seeing.

Best regards,

Lukas

> >>
> >>Probably the same problem with the one in v2, but on older HW.
> >>
> >>
> >>commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
> >>Author: Lukas Wunner 
> >>Date:   Wed Mar 9 12:52:53 2016 +0100
> >>
> >> drm/i915: Fix races on fbdev
> >>
> >> The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> >> been witnessed to run before intel_fbdev_initial_config_async()
> >> has finished.
> >>
> >> We might likewise receive hotplug events before we've had a chance to
> >> fully set up the fbdev.
> >>
> >> Fix by waiting for the asynchronous thread to finish.
> >>
> >> v2:
> >> An async_synchronize_full() was also added to intel_fbdev_set_suspend()
> >> in v1 which turned out to be entirely gratuitous. It caused a deadlock
> >> on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
> >> for CI support) and was unnecessary since a device is never suspended
> >> until its ->probe callback (and all asynchronous tasks it scheduled)
> >> have finished. See dpm_prepare(), which calls wait_for_device_probe(),
> >> which calls async_synchronize_full().
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> >> Reported-by: Gustav Fägerlind 
> >> Reported-by: "Li, Weinan Z" 
> >> Cc: Chris Wilson 
> >> Cc: sta...@vger.kernel.org
> >> Signed-off-by: Lukas Wunner 
> >> Signed-off-by: Daniel Vetter 
> >> Link: 
> >> http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org
> >>
> >>
> >>Regards,
> >>Gabriel
> >v2 passed CI fine, save for one warning not caused by the patch:
> >https://patchwork.freedesktop.org/series/4068/
> >
> >For comparison, this was v1:
> >https://patchwork.freedesktop.org/patch/75840/
> >
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-31 Thread Gabriel Feceoru



On 31.03.2016 00:35, Lukas Wunner wrote:

Hi Gabriel,

On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote:

This commit causes a hang while running kms suspend tests
(kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.


This happened with v1 but not with v2 of the patch.
Please check if somehow v1 ended up in your tree.


It's v2.

Tomi already replied, meantime I also looked at the results.
The current regression is for ILK/SNB/IVB only (v1 seemed to affect more 
platforms).
Unfortunately these machines were not available when v2 was tested, so 
this couldn't be detected.


Regards,
Gabriel.



v2 passed CI fine, save for one warning not caused by the patch:
https://patchwork.freedesktop.org/series/4068/

For comparison, this was v1:
https://patchwork.freedesktop.org/patch/75840/

Best regards,

Lukas



Probably the same problem with the one in v2, but on older HW.


commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
Author: Lukas Wunner 
Date:   Wed Mar 9 12:52:53 2016 +0100

 drm/i915: Fix races on fbdev

 The ->lastclose callback invokes intel_fbdev_restore_mode() and has
 been witnessed to run before intel_fbdev_initial_config_async()
 has finished.

 We might likewise receive hotplug events before we've had a chance to
 fully set up the fbdev.

 Fix by waiting for the asynchronous thread to finish.

 v2:
 An async_synchronize_full() was also added to intel_fbdev_set_suspend()
 in v1 which turned out to be entirely gratuitous. It caused a deadlock
 on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
 for CI support) and was unnecessary since a device is never suspended
 until its ->probe callback (and all asynchronous tasks it scheduled)
 have finished. See dpm_prepare(), which calls wait_for_device_probe(),
 which calls async_synchronize_full().

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
 Reported-by: Gustav Fägerlind 
 Reported-by: "Li, Weinan Z" 
 Cc: Chris Wilson 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Lukas Wunner 
 Signed-off-by: Daniel Vetter 
 Link: 
http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org


Regards,
Gabriel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-31 Thread Tomi Sarvela
Hello Lukas,

The problem with the results in your link is that there is no HSW, ILK, IVB or 
SNB results. This might give the impression that everything is well.

Most damning is lack of HSW-gt2 and SNB-dellxps: those machines hang on to 
APC, and have run quite stably for every Patchwork run. The case isn't strong 
enough yet that series should fail if either of those won't run, but it might 
be so in future.

Tomi



On Wednesday 30 March 2016 23:35:08 Lukas Wunner wrote:
> Hi Gabriel,
> 
> On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote:
> > This commit causes a hang while running kms suspend tests
> > (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.
> 
> This happened with v1 but not with v2 of the patch.
> Please check if somehow v1 ended up in your tree.
> 
> v2 passed CI fine, save for one warning not caused by the patch:
> https://patchwork.freedesktop.org/series/4068/
> 
> For comparison, this was v1:
> https://patchwork.freedesktop.org/patch/75840/
> 
> Best regards,
> 
> Lukas
> 
> > Probably the same problem with the one in v2, but on older HW.
> > 
> > 
> > commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
> > Author: Lukas Wunner 
> > Date:   Wed Mar 9 12:52:53 2016 +0100
> > 
> > drm/i915: Fix races on fbdev
> > 
> > The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> > been witnessed to run before intel_fbdev_initial_config_async()
> > has finished.
> > 
> > We might likewise receive hotplug events before we've had a chance to
> > fully set up the fbdev.
> > 
> > Fix by waiting for the asynchronous thread to finish.
> > 
> > v2:
> > An async_synchronize_full() was also added to
> > intel_fbdev_set_suspend()
> > in v1 which turned out to be entirely gratuitous. It caused a deadlock
> > on suspend (discovered by CI, thanks to Damien Lespiau and Tomi
> > Sarvela
> > for CI support) and was unnecessary since a device is never suspended
> > until its ->probe callback (and all asynchronous tasks it scheduled)
> > have finished. See dpm_prepare(), which calls wait_for_device_probe(),
> > which calls async_synchronize_full().
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> > Reported-by: Gustav Fägerlind 
> > Reported-by: "Li, Weinan Z" 
> > Cc: Chris Wilson 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Lukas Wunner 
> > Signed-off-by: Daniel Vetter 
> > Link:
> > http://patchwork.freedesktop.org/patch/msgid/20160309115147.67B2B6E0D
> > 3...@gabe.freedesktop.org> 
> > Regards,
> > Gabriel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-30 Thread Lukas Wunner
Hi Gabriel,

On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote:
> This commit causes a hang while running kms suspend tests
> (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.

This happened with v1 but not with v2 of the patch.
Please check if somehow v1 ended up in your tree.

v2 passed CI fine, save for one warning not caused by the patch:
https://patchwork.freedesktop.org/series/4068/

For comparison, this was v1:
https://patchwork.freedesktop.org/patch/75840/

Best regards,

Lukas

> 
> Probably the same problem with the one in v2, but on older HW.
> 
> 
> commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
> Author: Lukas Wunner 
> Date:   Wed Mar 9 12:52:53 2016 +0100
> 
> drm/i915: Fix races on fbdev
> 
> The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> been witnessed to run before intel_fbdev_initial_config_async()
> has finished.
> 
> We might likewise receive hotplug events before we've had a chance to
> fully set up the fbdev.
> 
> Fix by waiting for the asynchronous thread to finish.
> 
> v2:
> An async_synchronize_full() was also added to intel_fbdev_set_suspend()
> in v1 which turned out to be entirely gratuitous. It caused a deadlock
> on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
> for CI support) and was unnecessary since a device is never suspended
> until its ->probe callback (and all asynchronous tasks it scheduled)
> have finished. See dpm_prepare(), which calls wait_for_device_probe(),
> which calls async_synchronize_full().
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> Reported-by: Gustav Fägerlind 
> Reported-by: "Li, Weinan Z" 
> Cc: Chris Wilson 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Lukas Wunner 
> Signed-off-by: Daniel Vetter 
> Link: 
> http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org
> 
> 
> Regards,
> Gabriel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-30 Thread Daniel Vetter
On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote:
> This commit causes a hang while running kms suspend tests
> (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.
> 
> Probably the same problem with the one in v2, but on older HW.

I did check the patchwork/BAT-CI results and it looked clean. Is this a
new machine? Should I just revert for now until we have a proper fix?
-Daniel

> 
> 
> commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
> Author: Lukas Wunner 
> Date:   Wed Mar 9 12:52:53 2016 +0100
> 
> drm/i915: Fix races on fbdev
> 
> The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> been witnessed to run before intel_fbdev_initial_config_async()
> has finished.
> 
> We might likewise receive hotplug events before we've had a chance to
> fully set up the fbdev.
> 
> Fix by waiting for the asynchronous thread to finish.
> 
> v2:
> An async_synchronize_full() was also added to intel_fbdev_set_suspend()
> in v1 which turned out to be entirely gratuitous. It caused a deadlock
> on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
> for CI support) and was unnecessary since a device is never suspended
> until its ->probe callback (and all asynchronous tasks it scheduled)
> have finished. See dpm_prepare(), which calls wait_for_device_probe(),
> which calls async_synchronize_full().
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> Reported-by: Gustav Fägerlind 
> Reported-by: "Li, Weinan Z" 
> Cc: Chris Wilson 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Lukas Wunner 
> Signed-off-by: Daniel Vetter 
> Link: 
> http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org
> 
> 
> Regards,
> Gabriel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

2016-03-30 Thread Gabriel Feceoru
This commit causes a hang while running kms suspend tests 
(kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.


Probably the same problem with the one in v2, but on older HW.


commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
Author: Lukas Wunner 
Date:   Wed Mar 9 12:52:53 2016 +0100

drm/i915: Fix races on fbdev

The ->lastclose callback invokes intel_fbdev_restore_mode() and has
been witnessed to run before intel_fbdev_initial_config_async()
has finished.

We might likewise receive hotplug events before we've had a chance to
fully set up the fbdev.

Fix by waiting for the asynchronous thread to finish.

v2:
An async_synchronize_full() was also added to intel_fbdev_set_suspend()
in v1 which turned out to be entirely gratuitous. It caused a deadlock
on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
for CI support) and was unnecessary since a device is never suspended
until its ->probe callback (and all asynchronous tasks it scheduled)
have finished. See dpm_prepare(), which calls wait_for_device_probe(),
which calls async_synchronize_full().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
Reported-by: Gustav Fägerlind 
Reported-by: "Li, Weinan Z" 
Cc: Chris Wilson 
Cc: sta...@vger.kernel.org
Signed-off-by: Lukas Wunner 
Signed-off-by: Daniel Vetter 
Link: 
http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org



Regards,
Gabriel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx