Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
Hi Lukas, Ran this patch through the farm, and it seems that this patch might've helped HSW, maybe even BSW. ILK, IVB and SNB are still hanging hard to the same igt-test, kms_pipe_crc_basic@suspend-read-crc-pipe-? I'll try to make a kernel with the changes you proposed, but as I'm not familiar with the driver innards, it might take a while. Tomi On Thursday 31 March 2016 22:35:17 Lukas Wunner wrote: > Hi Tomi, > > On Thu, Mar 31, 2016 at 10:21:16AM +0300, Tomi Sarvela wrote: > > The problem with the results in your link is that there is no HSW, ILK, > > IVB > > or SNB results. This might give the impression that everything is well. > > > > Most damning is lack of HSW-gt2 and SNB-dellxps: those machines hang on to > > APC, and have run quite stably for every Patchwork run. The case isn't > > strong enough yet that series should fail if either of those won't run, > > but it might be so in future. > > So my patch seeking to fix the hangs has passed Romania CI with "success": > https://patchwork.freedesktop.org/series/5125/ > > However I don't see HSW-gt2 and SNB-dellxps in their hardware lineup. > And I would still like to know what the actual cause of the hangs is > since they do not occur on my IVB laptop. > > If you get the chance maybe you can repeat the test and include a > "dump_stack();" at the beginning of intel_fbdev_output_poll_changed() > and intel_fbdev_restore_mode(). This should show in the logs which of > the two functions is called during suspend/resume and from where it's > called. My guess is that this particular hardware causes a hotplug > signal to be generated upon waking up. If the hangs do not stop with > this patch, booting with "no_console_suspend" should at least show > what's going on. > > Thank you & best regards, > > Lukas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
Hi Tomi, On Thu, Mar 31, 2016 at 10:21:16AM +0300, Tomi Sarvela wrote: > The problem with the results in your link is that there is no HSW, ILK, IVB > or SNB results. This might give the impression that everything is well. > > Most damning is lack of HSW-gt2 and SNB-dellxps: those machines hang on to > APC, and have run quite stably for every Patchwork run. The case isn't strong > enough yet that series should fail if either of those won't run, but it might > be so in future. So my patch seeking to fix the hangs has passed Romania CI with "success": https://patchwork.freedesktop.org/series/5125/ However I don't see HSW-gt2 and SNB-dellxps in their hardware lineup. And I would still like to know what the actual cause of the hangs is since they do not occur on my IVB laptop. If you get the chance maybe you can repeat the test and include a "dump_stack();" at the beginning of intel_fbdev_output_poll_changed() and intel_fbdev_restore_mode(). This should show in the logs which of the two functions is called during suspend/resume and from where it's called. My guess is that this particular hardware causes a hotplug signal to be generated upon waking up. If the hangs do not stop with this patch, booting with "no_console_suspend" should at least show what's going on. Thank you & best regards, Lukas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
Hi Gabriel, On Thu, Mar 31, 2016 at 10:42:37AM +0300, Gabriel Feceoru wrote: > On 31.03.2016 00:35, Lukas Wunner wrote: > >On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote: > >>This commit causes a hang while running kms suspend tests > >>(kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI. > > Tomi already replied, meantime I also looked at the results. > The current regression is for ILK/SNB/IVB only (v1 seemed to affect more > platforms). > Unfortunately these machines were not available when v2 was tested, so this > couldn't be detected. I dev on an IVB machine and cannot reproduce this. Suspend works fine. All the patch does is call async_synchronize_full() (1) when a hotplug event arrives or (2) when the last DRM client closes the connection. Either of these two things seems to be happening on your test machines when running the suspend test. The PM core suspends and resumes individual devices asynchronously and calls async_synchronize_full() in a couple of places. If a device's PM callbacks also call async_synchronize_full(), the machine deadlocks. It is unnecessary that we call async_synchronize_full(), we only need to synchronize up to a specific cookie (which represents initialization of the fbdev). So I've just posted a patch to replace the calls to async_synchronize_full() with async_synchronize_cookie(). This should make things less fragile and hopefully also solve the hangs you're seeing. Best regards, Lukas > >> > >>Probably the same problem with the one in v2, but on older HW. > >> > >> > >>commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222 > >>Author: Lukas Wunner> >>Date: Wed Mar 9 12:52:53 2016 +0100 > >> > >> drm/i915: Fix races on fbdev > >> > >> The ->lastclose callback invokes intel_fbdev_restore_mode() and has > >> been witnessed to run before intel_fbdev_initial_config_async() > >> has finished. > >> > >> We might likewise receive hotplug events before we've had a chance to > >> fully set up the fbdev. > >> > >> Fix by waiting for the asynchronous thread to finish. > >> > >> v2: > >> An async_synchronize_full() was also added to intel_fbdev_set_suspend() > >> in v1 which turned out to be entirely gratuitous. It caused a deadlock > >> on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela > >> for CI support) and was unnecessary since a device is never suspended > >> until its ->probe callback (and all asynchronous tasks it scheduled) > >> have finished. See dpm_prepare(), which calls wait_for_device_probe(), > >> which calls async_synchronize_full(). > >> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580 > >> Reported-by: Gustav Fägerlind > >> Reported-by: "Li, Weinan Z" > >> Cc: Chris Wilson > >> Cc: sta...@vger.kernel.org > >> Signed-off-by: Lukas Wunner > >> Signed-off-by: Daniel Vetter > >> Link: > >> http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org > >> > >> > >>Regards, > >>Gabriel > >v2 passed CI fine, save for one warning not caused by the patch: > >https://patchwork.freedesktop.org/series/4068/ > > > >For comparison, this was v1: > >https://patchwork.freedesktop.org/patch/75840/ > > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
On 31.03.2016 00:35, Lukas Wunner wrote: Hi Gabriel, On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote: This commit causes a hang while running kms suspend tests (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI. This happened with v1 but not with v2 of the patch. Please check if somehow v1 ended up in your tree. It's v2. Tomi already replied, meantime I also looked at the results. The current regression is for ILK/SNB/IVB only (v1 seemed to affect more platforms). Unfortunately these machines were not available when v2 was tested, so this couldn't be detected. Regards, Gabriel. v2 passed CI fine, save for one warning not caused by the patch: https://patchwork.freedesktop.org/series/4068/ For comparison, this was v1: https://patchwork.freedesktop.org/patch/75840/ Best regards, Lukas Probably the same problem with the one in v2, but on older HW. commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222 Author: Lukas WunnerDate: Wed Mar 9 12:52:53 2016 +0100 drm/i915: Fix races on fbdev The ->lastclose callback invokes intel_fbdev_restore_mode() and has been witnessed to run before intel_fbdev_initial_config_async() has finished. We might likewise receive hotplug events before we've had a chance to fully set up the fbdev. Fix by waiting for the asynchronous thread to finish. v2: An async_synchronize_full() was also added to intel_fbdev_set_suspend() in v1 which turned out to be entirely gratuitous. It caused a deadlock on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela for CI support) and was unnecessary since a device is never suspended until its ->probe callback (and all asynchronous tasks it scheduled) have finished. See dpm_prepare(), which calls wait_for_device_probe(), which calls async_synchronize_full(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580 Reported-by: Gustav Fägerlind Reported-by: "Li, Weinan Z" Cc: Chris Wilson Cc: sta...@vger.kernel.org Signed-off-by: Lukas Wunner Signed-off-by: Daniel Vetter Link: http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org Regards, Gabriel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
Hello Lukas, The problem with the results in your link is that there is no HSW, ILK, IVB or SNB results. This might give the impression that everything is well. Most damning is lack of HSW-gt2 and SNB-dellxps: those machines hang on to APC, and have run quite stably for every Patchwork run. The case isn't strong enough yet that series should fail if either of those won't run, but it might be so in future. Tomi On Wednesday 30 March 2016 23:35:08 Lukas Wunner wrote: > Hi Gabriel, > > On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote: > > This commit causes a hang while running kms suspend tests > > (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI. > > This happened with v1 but not with v2 of the patch. > Please check if somehow v1 ended up in your tree. > > v2 passed CI fine, save for one warning not caused by the patch: > https://patchwork.freedesktop.org/series/4068/ > > For comparison, this was v1: > https://patchwork.freedesktop.org/patch/75840/ > > Best regards, > > Lukas > > > Probably the same problem with the one in v2, but on older HW. > > > > > > commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222 > > Author: Lukas Wunner> > Date: Wed Mar 9 12:52:53 2016 +0100 > > > > drm/i915: Fix races on fbdev > > > > The ->lastclose callback invokes intel_fbdev_restore_mode() and has > > been witnessed to run before intel_fbdev_initial_config_async() > > has finished. > > > > We might likewise receive hotplug events before we've had a chance to > > fully set up the fbdev. > > > > Fix by waiting for the asynchronous thread to finish. > > > > v2: > > An async_synchronize_full() was also added to > > intel_fbdev_set_suspend() > > in v1 which turned out to be entirely gratuitous. It caused a deadlock > > on suspend (discovered by CI, thanks to Damien Lespiau and Tomi > > Sarvela > > for CI support) and was unnecessary since a device is never suspended > > until its ->probe callback (and all asynchronous tasks it scheduled) > > have finished. See dpm_prepare(), which calls wait_for_device_probe(), > > which calls async_synchronize_full(). > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580 > > Reported-by: Gustav Fägerlind > > Reported-by: "Li, Weinan Z" > > Cc: Chris Wilson > > Cc: sta...@vger.kernel.org > > Signed-off-by: Lukas Wunner > > Signed-off-by: Daniel Vetter > > Link: > > http://patchwork.freedesktop.org/patch/msgid/20160309115147.67B2B6E0D > > 3...@gabe.freedesktop.org> > > Regards, > > Gabriel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
Hi Gabriel, On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote: > This commit causes a hang while running kms suspend tests > (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI. This happened with v1 but not with v2 of the patch. Please check if somehow v1 ended up in your tree. v2 passed CI fine, save for one warning not caused by the patch: https://patchwork.freedesktop.org/series/4068/ For comparison, this was v1: https://patchwork.freedesktop.org/patch/75840/ Best regards, Lukas > > Probably the same problem with the one in v2, but on older HW. > > > commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222 > Author: Lukas Wunner> Date: Wed Mar 9 12:52:53 2016 +0100 > > drm/i915: Fix races on fbdev > > The ->lastclose callback invokes intel_fbdev_restore_mode() and has > been witnessed to run before intel_fbdev_initial_config_async() > has finished. > > We might likewise receive hotplug events before we've had a chance to > fully set up the fbdev. > > Fix by waiting for the asynchronous thread to finish. > > v2: > An async_synchronize_full() was also added to intel_fbdev_set_suspend() > in v1 which turned out to be entirely gratuitous. It caused a deadlock > on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela > for CI support) and was unnecessary since a device is never suspended > until its ->probe callback (and all asynchronous tasks it scheduled) > have finished. See dpm_prepare(), which calls wait_for_device_probe(), > which calls async_synchronize_full(). > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580 > Reported-by: Gustav Fägerlind > Reported-by: "Li, Weinan Z" > Cc: Chris Wilson > Cc: sta...@vger.kernel.org > Signed-off-by: Lukas Wunner > Signed-off-by: Daniel Vetter > Link: > http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org > > > Regards, > Gabriel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote: > This commit causes a hang while running kms suspend tests > (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI. > > Probably the same problem with the one in v2, but on older HW. I did check the patchwork/BAT-CI results and it looked clean. Is this a new machine? Should I just revert for now until we have a proper fix? -Daniel > > > commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222 > Author: Lukas Wunner> Date: Wed Mar 9 12:52:53 2016 +0100 > > drm/i915: Fix races on fbdev > > The ->lastclose callback invokes intel_fbdev_restore_mode() and has > been witnessed to run before intel_fbdev_initial_config_async() > has finished. > > We might likewise receive hotplug events before we've had a chance to > fully set up the fbdev. > > Fix by waiting for the asynchronous thread to finish. > > v2: > An async_synchronize_full() was also added to intel_fbdev_set_suspend() > in v1 which turned out to be entirely gratuitous. It caused a deadlock > on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela > for CI support) and was unnecessary since a device is never suspended > until its ->probe callback (and all asynchronous tasks it scheduled) > have finished. See dpm_prepare(), which calls wait_for_device_probe(), > which calls async_synchronize_full(). > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580 > Reported-by: Gustav Fägerlind > Reported-by: "Li, Weinan Z" > Cc: Chris Wilson > Cc: sta...@vger.kernel.org > Signed-off-by: Lukas Wunner > Signed-off-by: Daniel Vetter > Link: > http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org > > > Regards, > Gabriel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB
This commit causes a hang while running kms suspend tests (kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI. Probably the same problem with the one in v2, but on older HW. commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222 Author: Lukas WunnerDate: Wed Mar 9 12:52:53 2016 +0100 drm/i915: Fix races on fbdev The ->lastclose callback invokes intel_fbdev_restore_mode() and has been witnessed to run before intel_fbdev_initial_config_async() has finished. We might likewise receive hotplug events before we've had a chance to fully set up the fbdev. Fix by waiting for the asynchronous thread to finish. v2: An async_synchronize_full() was also added to intel_fbdev_set_suspend() in v1 which turned out to be entirely gratuitous. It caused a deadlock on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela for CI support) and was unnecessary since a device is never suspended until its ->probe callback (and all asynchronous tasks it scheduled) have finished. See dpm_prepare(), which calls wait_for_device_probe(), which calls async_synchronize_full(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580 Reported-by: Gustav Fägerlind Reported-by: "Li, Weinan Z" Cc: Chris Wilson Cc: sta...@vger.kernel.org Signed-off-by: Lukas Wunner Signed-off-by: Daniel Vetter Link: http://patchwork.freedesktop.org/patch/msgid/20160309115147.67b2b6e...@gabe.freedesktop.org Regards, Gabriel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx