Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)

2017-02-11 Thread Martin Steigerwald
Am Mittwoch, 1. Februar 2017, 14:11:22 CET schrieb David Weinehall:
> On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> > Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > > Things seem to be calming down a bit, and everything looks nominal.
> > > 
> > > There's only been about 250 changes (not counting merges) in the last
> > > week, and the diffstat touches less than 300 files (with drivers and
> > > architecture updates being the bulk, but there's tooling, networking
> > > and filesystems in there too).
> > > 
> > > So keep testing, and I think we'll have a regular release schedule.
> > 
> > Testing this is no fun:
> > 
> > Bug 99533 - black screen after switching session
> > https://bugs.freedesktop.org/99533
> > 
> > 
> > This after GPU hang/lockups with Kernel 4.9 reported as for example:
> > 
> > Bug 98922 - [snb] GPU hang on PlaneShift
> > https://bugs.freedesktop.org/98922
> > 
> > Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
> > 
> > 
> > I am back at kernel 4.8.15 as I need this machine for production work.
> > 
> > Sometimes I wish for a microkernel that might be able to reincarnate
> > drivers that hang or do wierd things like that. That may at least give a
> > way to actually do some debugging or even get the desktop session back
> > without loosing its state. Especially for graphics drivers and
> > hibernating/resuming from hibernations which also occasionally fails –
> > again without leaving a way to interact with the machine to do further
> > debugging. Linux kernel usually just crashes completely, not even a ping
> > or ssh possible, or it at least stuck with a black display without any
> > way to restart the graphics driver cause it seems to be in some undefined
> > state. Combined with occasionally happening bugs this makes triaging bugs
> > time consuming and risky. I do like to help testing, but maybe its time
> > to just switch to distro kernels and be done about it, as I regularily
> > come across bugs that are too expensive for me to triage.
> > 
> > Please understand that I am not willing to bisect these occasionally
> > happening bugs with have the potential to cause data loss due to having
> > to switch off the machine forcefully. Fortunately at least KMail saves a
> > mail I write from time to time and also Kate does swap files.
> > 
> > I am also a bit unwilling to do further debugging of this one as I usually
> > use two sessions when I am at work and I risk loosing data I work on.
> > But… at least with this issue it seems I would have a way to SSH into the
> > machine before kicking it.
> > 
> > 
> > I am dissatisfied with the state of the Intel graphics driver on this
> > ThinkPad T520 with Sandybridge since kernel 4.9 and wonder whether you
> > guys at Intel really test things with older hardware versions.
> 
> Yes, we do. But for practical reasons we can only do testing for things
> that we actually have testcases for, and obviously we don't have the
> manpower to actually do *manual* testing on every platform, so issues
> for older platforms that are only triggered by manual interaction tend
> to slip under the radar.
> 
> We have a testfarm that tests every nightly build on all platforms we
> have test machines for. The testcases are publicly available here:
> 
> https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
> 
> Obviously most of our manpower is spent on development and testing for
> current and future platforms, so for issues that involve older platforms,
> especially something as old as Sandybridge (which is, by now, 6 years old)
> we are happy for help with testing and bisection.
> 
> If the issues are specific to certain subsets of a platform it obviously
> gets even more complex; it'd be a combinatorial nightmare to build a
> testfarm that could test every variation of every platform.
> 
> If I got the count right the i915 driver supports around a hundred
> different varieties of Intel graphics; combine that with the number of
> different displays people connect, the number of eDP display that the
> vendors connect, the different BIOSes that vendors use, etc., and I
> think you'll begin to see what we're combating) -- to make things even
> more complex you can connect several displays to each graphics card
> (possibly via adapters), displays that don't always meet the standards
> that they claim to meet.  Due to limited room we are also a bit limited
> when it comes to testing with multi-monitor setups.
> 
> This is why any help is welcome and sometimes even necessary. If you're
> afraid of dataloss, be aware that it's possible to boot your system with
> file systems mounted read-only; you could also boot from a USB-stick or
> similar.
> 
> If you can find a testcase in i-g-t that easily reproduces the issue
> that'd also be very helpful. Do note that not all testcases in i-g-t
> are run as part of our nightly tests, since some of them are *extremely*
> time consuming; the full combinatorial

Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)

2017-02-01 Thread David Weinehall
On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > Things seem to be calming down a bit, and everything looks nominal.
> > 
> > There's only been about 250 changes (not counting merges) in the last
> > week, and the diffstat touches less than 300 files (with drivers and
> > architecture updates being the bulk, but there's tooling, networking
> > and filesystems in there too).
> > 
> > So keep testing, and I think we'll have a regular release schedule.
> 
> Testing this is no fun:
> 
> Bug 99533 - black screen after switching session
> https://bugs.freedesktop.org/99533
> 
> 
> This after GPU hang/lockups with Kernel 4.9 reported as for example:
> 
> Bug 98922 - [snb] GPU hang on PlaneShift
> https://bugs.freedesktop.org/98922
> 
> Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
> 
> 
> I am back at kernel 4.8.15 as I need this machine for production work.
> 
> Sometimes I wish for a microkernel that might be able to reincarnate drivers 
> that hang or do wierd things like that. That may at least give a way to 
> actually do some debugging or even get the desktop session back without 
> loosing its state. Especially for graphics drivers and hibernating/resuming 
> from hibernations which also occasionally fails – again without leaving a way 
> to interact with the machine to do further debugging. Linux kernel usually 
> just crashes completely, not even a ping or ssh possible, or it at least 
> stuck 
> with a black display without any way to restart the graphics driver cause it 
> seems to be in some undefined state. Combined with occasionally happening 
> bugs 
> this makes triaging bugs time consuming and risky. I do like to help testing, 
> but maybe its time to just switch to distro kernels and be done about it, as 
> I 
> regularily come across bugs that are too expensive for me to triage.
> 
> Please understand that I am not willing to bisect these occasionally 
> happening 
> bugs with have the potential to cause data loss due to having to switch off 
> the machine forcefully. Fortunately at least KMail saves a mail I write from 
> time to time and also Kate does swap files.
> 
> I am also a bit unwilling to do further debugging of this one as I usually 
> use 
> two sessions when I am at work and I risk loosing data I work on. But… at 
> least with this issue it seems I would have a way to SSH into the machine 
> before kicking it.
> 
> 
> I am dissatisfied with the state of the Intel graphics driver on this 
> ThinkPad 
> T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel 
> really test things with older hardware versions.

Yes, we do. But for practical reasons we can only do testing for things
that we actually have testcases for, and obviously we don't have the
manpower to actually do *manual* testing on every platform, so issues
for older platforms that are only triggered by manual interaction tend
to slip under the radar.

We have a testfarm that tests every nightly build on all platforms we
have test machines for. The testcases are publicly available here:

https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/

Obviously most of our manpower is spent on development and testing for current
and future platforms, so for issues that involve older platforms,
especially something as old as Sandybridge (which is, by now, 6 years old)
we are happy for help with testing and bisection.

If the issues are specific to certain subsets of a platform it obviously
gets even more complex; it'd be a combinatorial nightmare to build a
testfarm that could test every variation of every platform.

If I got the count right the i915 driver supports around a hundred
different varieties of Intel graphics; combine that with the number of
different displays people connect, the number of eDP display that the
vendors connect, the different BIOSes that vendors use, etc., and I
think you'll begin to see what we're combating) -- to make things even
more complex you can connect several displays to each graphics card
(possibly via adapters), displays that don't always meet the standards
that they claim to meet.  Due to limited room we are also a bit limited
when it comes to testing with multi-monitor setups.

This is why any help is welcome and sometimes even necessary. If you're
afraid of dataloss, be aware that it's possible to boot your system with
file systems mounted read-only; you could also boot from a USB-stick or
similar.

If you can find a testcase in i-g-t that easily reproduces the issue
that'd also be very helpful. Do note that not all testcases in i-g-t
are run as part of our nightly tests, since some of them are *extremely*
time consuming; the full combinatorial testcase, for instance, can
take weeks or months--I haven't done a full run recently--to complete.

I hope this helps you understand why bugs can slip under the radar,
and why a bisect is so important.


Kind regards, Davi

[REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)

2017-01-25 Thread Martin Steigerwald
Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> Things seem to be calming down a bit, and everything looks nominal.
> 
> There's only been about 250 changes (not counting merges) in the last
> week, and the diffstat touches less than 300 files (with drivers and
> architecture updates being the bulk, but there's tooling, networking
> and filesystems in there too).
> 
> So keep testing, and I think we'll have a regular release schedule.

Testing this is no fun:

Bug 99533 - black screen after switching session
https://bugs.freedesktop.org/99533


This after GPU hang/lockups with Kernel 4.9 reported as for example:

Bug 98922 - [snb] GPU hang on PlaneShift
https://bugs.freedesktop.org/98922

Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.


I am back at kernel 4.8.15 as I need this machine for production work.

Sometimes I wish for a microkernel that might be able to reincarnate drivers 
that hang or do wierd things like that. That may at least give a way to 
actually do some debugging or even get the desktop session back without 
loosing its state. Especially for graphics drivers and hibernating/resuming 
from hibernations which also occasionally fails – again without leaving a way 
to interact with the machine to do further debugging. Linux kernel usually 
just crashes completely, not even a ping or ssh possible, or it at least stuck 
with a black display without any way to restart the graphics driver cause it 
seems to be in some undefined state. Combined with occasionally happening bugs 
this makes triaging bugs time consuming and risky. I do like to help testing, 
but maybe its time to just switch to distro kernels and be done about it, as I 
regularily come across bugs that are too expensive for me to triage.

Please understand that I am not willing to bisect these occasionally happening 
bugs with have the potential to cause data loss due to having to switch off 
the machine forcefully. Fortunately at least KMail saves a mail I write from 
time to time and also Kate does swap files.

I am also a bit unwilling to do further debugging of this one as I usually use 
two sessions when I am at work and I risk loosing data I work on. But… at 
least with this issue it seems I would have a way to SSH into the machine 
before kicking it.


I am dissatisfied with the state of the Intel graphics driver on this ThinkPad 
T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel 
really test things with older hardware versions.

Thanks,
-- 
Martin