Re: [Intel-gfx] Regression of v4.6-rc vs. v4.5 bisected: a98ee79317b4 "drm/i915/fbc: enable FBC by default on HSW and BDW"

2016-05-05 Thread Stefan Richter
On May 05 Daniel Vetter wrote:
> Hm, if it's watermarks then testing with latest drm-intel-nightly would be
> interesting. We finally managed to land atomic watermark updates (should
> all be there in 4.7 too):
> 
> https://cgit.freedesktop.org/drm-intel

I will see if I can test this sometime soon.
-- 
Stefan Richter
-==- -=-= --==-
http://arcgraph.de/sr/


Re: [Intel-gfx] Regression of v4.6-rc vs. v4.5 bisected: a98ee79317b4 "drm/i915/fbc: enable FBC by default on HSW and BDW"

2016-05-05 Thread Stefan Richter
On May 05 Daniel Vetter wrote:
> Hm, if it's watermarks then testing with latest drm-intel-nightly would be
> interesting. We finally managed to land atomic watermark updates (should
> all be there in 4.7 too):
> 
> https://cgit.freedesktop.org/drm-intel

I will see if I can test this sometime soon.
-- 
Stefan Richter
-==- -=-= --==-
http://arcgraph.de/sr/


Re: [Intel-gfx] Regression of v4.6-rc vs. v4.5 bisected: a98ee79317b4 "drm/i915/fbc: enable FBC by default on HSW and BDW"

2016-05-05 Thread Daniel Vetter
On Thu, May 05, 2016 at 06:50:14PM +, Zanoni, Paulo R wrote:
> Em Qui, 2016-05-05 às 19:45 +0200, Stefan Richter escreveu:
> > On Apr 30 Stefan Richter wrote:
> > > 
> > > On Apr 29 Stefan Richter wrote:
> > > > 
> > > > On Apr 26 Stefan Richter wrote:  
> > > > > 
> > > > > v4.6-rc solidly hangs after a short while after boot, login to
> > > > > X11, and
> > > > > doing nothing much remarkable on the just brought up X desktop.
> > > > > 
> > > > > Hardware: x86-64, E3-1245 v3 (Haswell),
> > > > >   mainboard Supermicro X10SAE,
> > > > >   using integrated Intel graphics (HD P4600, i915
> > > > > driver),
> > > > >   C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> > > > >   Intel LAN (i217, igb driver),
> > > > >   several IEEE 1394 controllers, some of them behind
> > > > >   PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI,
> > > > > Tundra)
> > > > >   and one PCI-to-CardBus bridge (Ricoh)
> > > > > 
> > > > > kernel.org kernel, Gentoo Linux userland
> > > > > 
> > > > > 1. known good:  v4.5-rc5 (gcc 4.9.3)
> > > > >    known bad:   v4.6-rc2 (gcc 4.9.3), only tried one time
> > > > > 
> > > > > 2. known good:  v4.5.2 (gcc 5.2.0)
> > > > >    known bad:   v4.6-rc5 (gcc 5.2.0), only tried one time
> > > > > 
> > > > > I will send my linux-4.6-rc5/.config in a follow-up message.  
> > >  .config: http://www.spinics.net/lists/kernel/msg2243444.html
> > >    lspci: http://www.spinics.net/lists/kernel/msg2243447.html
> > > 
> > > Some userland package versions, in case these have any bearing:
> > > x11-base/xorg-drivers-1.17
> > > x11-base/xorg-server-1.17.4
> > > x11-bas/xorg-x11-7.4-r2
> > Furthermore, there is a single display hooked up via DisplayPort.
> > 
> > > 
> > > > 
> > > > After it proved impossible to capture an oops through netconsole,
> > > > I
> > > > started git bisect.  This will apparently take almost a week, as
> > > > git
> > > > estimated 13 bisection steps and I will be allowing about 12
> > > > hours of
> > > > uptime as a sign for a good kernel.  (In my four or five tests of
> > > > bad
> > > > kernels before I started bisection, they hung after 3
> > > > minutes...5.5 hours
> > > > uptime, with no discernible difference in workload.  Maybe 12 h
> > > > cutoff is
> > > > even too short...)  
> > I took at least 18 hours uptime (usually 24 hours) as a sign for good
> > kernels.  During the bisection, bad kernels hung after 3 h, 2 h, 9
> > min,
> > 45 min, and 4 min uptime.  Thus I arrived at a98ee79317b4
> > "drm/i915/fbc:
> > enable FBC by default on HSW and BDW" as the point where the hangs
> > are
> > introduced.
> > 
> > Quoting the changelog of the commit:
> 
> Thanks for following the instructions on the commit message! :)
> 
> > 
> > Oh, and in case you - the person reading this commit message -
> > found
> > this commit through git bisect, please do the following:
> >  - Check your dmesg and see if there are error messages
> > mentioning
> >    underruns around the time your problem started happening.
> > 
> > Well, I always had the followings lines in dmesg:
> > [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared fifo
> > underrun on pipe A
> > [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO
> > underrun
> 
> Oh, well... I had a patch that would just disable FBC in case we saw a
> FIFO underrun, but it was rejected. Maybe this is the time to think
> about it again? Otherwise, I can't think of much besides disabling FBC
> on HSW until all the underruns and watermarks regressions are fixed
> forever.

Hm, if it's watermarks then testing with latest drm-intel-nightly would be
interesting. We finally managed to land atomic watermark updates (should
all be there in 4.7 too):

https://cgit.freedesktop.org/drm-intel

Cheers, Daniel

> 
> > 
> > I always got these when I switch on the DisplayPort attached monitor.
> > Recently I changed userland from kdm to sddm and noticed that I
> > apparently get these when sddm shuts down.  I am not aware of whether
> > or not this also already happened with kdm.
> > 
> > However, "around the time your problem started happening" there is
> > nothing in dmesg, because "your problem" is a complete hang without
> > possibility of disk IO and without netconsole output.
> > 
> >  - Download intel-gpu-tools, compile it, and run:
> >    $ sudo ./tests/kms_frontbuffer_tracking --run-subtest '*fbc-*' 
> > 2>&1 | tee fbc.txt
> >    Then send us the fbc.txt file, especially if you get a
> > failure.
> >    This will really maximize your chances of getting the bug
> > fixed
> >    quickly.
> > 
> > Do you need this while FBC is enabled, or can I run it while FBC is
> > disabled?
> 
> FBC enabled. Considering your description, my hope is that maybe some
> specific subtest will be able to hang your machine, so testing this
> again will require only running the specific subtest instead of waiting
> 18 hours.
> 
> > 
> >  - Try 

Re: [Intel-gfx] Regression of v4.6-rc vs. v4.5 bisected: a98ee79317b4 "drm/i915/fbc: enable FBC by default on HSW and BDW"

2016-05-05 Thread Daniel Vetter
On Thu, May 05, 2016 at 06:50:14PM +, Zanoni, Paulo R wrote:
> Em Qui, 2016-05-05 às 19:45 +0200, Stefan Richter escreveu:
> > On Apr 30 Stefan Richter wrote:
> > > 
> > > On Apr 29 Stefan Richter wrote:
> > > > 
> > > > On Apr 26 Stefan Richter wrote:  
> > > > > 
> > > > > v4.6-rc solidly hangs after a short while after boot, login to
> > > > > X11, and
> > > > > doing nothing much remarkable on the just brought up X desktop.
> > > > > 
> > > > > Hardware: x86-64, E3-1245 v3 (Haswell),
> > > > >   mainboard Supermicro X10SAE,
> > > > >   using integrated Intel graphics (HD P4600, i915
> > > > > driver),
> > > > >   C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI,
> > > > >   Intel LAN (i217, igb driver),
> > > > >   several IEEE 1394 controllers, some of them behind
> > > > >   PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI,
> > > > > Tundra)
> > > > >   and one PCI-to-CardBus bridge (Ricoh)
> > > > > 
> > > > > kernel.org kernel, Gentoo Linux userland
> > > > > 
> > > > > 1. known good:  v4.5-rc5 (gcc 4.9.3)
> > > > >    known bad:   v4.6-rc2 (gcc 4.9.3), only tried one time
> > > > > 
> > > > > 2. known good:  v4.5.2 (gcc 5.2.0)
> > > > >    known bad:   v4.6-rc5 (gcc 5.2.0), only tried one time
> > > > > 
> > > > > I will send my linux-4.6-rc5/.config in a follow-up message.  
> > >  .config: http://www.spinics.net/lists/kernel/msg2243444.html
> > >    lspci: http://www.spinics.net/lists/kernel/msg2243447.html
> > > 
> > > Some userland package versions, in case these have any bearing:
> > > x11-base/xorg-drivers-1.17
> > > x11-base/xorg-server-1.17.4
> > > x11-bas/xorg-x11-7.4-r2
> > Furthermore, there is a single display hooked up via DisplayPort.
> > 
> > > 
> > > > 
> > > > After it proved impossible to capture an oops through netconsole,
> > > > I
> > > > started git bisect.  This will apparently take almost a week, as
> > > > git
> > > > estimated 13 bisection steps and I will be allowing about 12
> > > > hours of
> > > > uptime as a sign for a good kernel.  (In my four or five tests of
> > > > bad
> > > > kernels before I started bisection, they hung after 3
> > > > minutes...5.5 hours
> > > > uptime, with no discernible difference in workload.  Maybe 12 h
> > > > cutoff is
> > > > even too short...)  
> > I took at least 18 hours uptime (usually 24 hours) as a sign for good
> > kernels.  During the bisection, bad kernels hung after 3 h, 2 h, 9
> > min,
> > 45 min, and 4 min uptime.  Thus I arrived at a98ee79317b4
> > "drm/i915/fbc:
> > enable FBC by default on HSW and BDW" as the point where the hangs
> > are
> > introduced.
> > 
> > Quoting the changelog of the commit:
> 
> Thanks for following the instructions on the commit message! :)
> 
> > 
> > Oh, and in case you - the person reading this commit message -
> > found
> > this commit through git bisect, please do the following:
> >  - Check your dmesg and see if there are error messages
> > mentioning
> >    underruns around the time your problem started happening.
> > 
> > Well, I always had the followings lines in dmesg:
> > [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared fifo
> > underrun on pipe A
> > [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO
> > underrun
> 
> Oh, well... I had a patch that would just disable FBC in case we saw a
> FIFO underrun, but it was rejected. Maybe this is the time to think
> about it again? Otherwise, I can't think of much besides disabling FBC
> on HSW until all the underruns and watermarks regressions are fixed
> forever.

Hm, if it's watermarks then testing with latest drm-intel-nightly would be
interesting. We finally managed to land atomic watermark updates (should
all be there in 4.7 too):

https://cgit.freedesktop.org/drm-intel

Cheers, Daniel

> 
> > 
> > I always got these when I switch on the DisplayPort attached monitor.
> > Recently I changed userland from kdm to sddm and noticed that I
> > apparently get these when sddm shuts down.  I am not aware of whether
> > or not this also already happened with kdm.
> > 
> > However, "around the time your problem started happening" there is
> > nothing in dmesg, because "your problem" is a complete hang without
> > possibility of disk IO and without netconsole output.
> > 
> >  - Download intel-gpu-tools, compile it, and run:
> >    $ sudo ./tests/kms_frontbuffer_tracking --run-subtest '*fbc-*' 
> > 2>&1 | tee fbc.txt
> >    Then send us the fbc.txt file, especially if you get a
> > failure.
> >    This will really maximize your chances of getting the bug
> > fixed
> >    quickly.
> > 
> > Do you need this while FBC is enabled, or can I run it while FBC is
> > disabled?
> 
> FBC enabled. Considering your description, my hope is that maybe some
> specific subtest will be able to hang your machine, so testing this
> again will require only running the specific subtest instead of waiting
> 18 hours.
> 
> > 
> >  - Try