[Intel-gfx] Possible 4.5 i915 Skylake regression

2016-03-13 Thread Andy Lutomirski
On Wed, Feb 17, 2016 at 8:18 AM, Daniel Vetter  wrote:
> On Tue, Feb 16, 2016 at 09:26:35AM -0800, Andy Lutomirski wrote:
>> On Tue, Feb 16, 2016 at 9:12 AM, Andy Lutomirski  
>> wrote:
>> > On Tue, Feb 16, 2016 at 8:12 AM, Daniel Vetter  wrote:
>> >> On Mon, Feb 15, 2016 at 06:58:33AM -0800, Andy Lutomirski wrote:
>> >>> On Sun, Feb 14, 2016 at 6:59 PM, Andy Lutomirski  
>> >>> wrote:
>> >>> > Hi-
>> >>> >
>> >>> > On 4.5-rc3 on a Dell XPS 13 9350 (Skylake i915, no nvidia on this
>> >>> > model), shortly after resume, I saw a single black flash on the
>> >>> > screen.  The log said:
>> >>> >
>> >>> > [Feb13 07:05] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR*
>> >>> > CPU pipe A FIFO underrun
>> >>> >
>> >>> > I haven't seen this on 4.4.
>> >>> >
>> >>> > I'd be happy to dig up debugging info, but I don't know what would be
>> >>> > useful.  I have no i915 module options set.
>> >>>
>> >>> It's flashing quite frequently now, although I seem to get the
>> >>> underrun warning only once per resume.
>> >>
>> >> We shut up the warning irq source to avoid hijacking an entire cpu core
>> >> ;-)
>> >>
>> >> There's a fix from Matt right after 4.5-rc4 in Linus' branch. I'm hoping
>> >> that should help.
>> >
>> > Do you mean:
>> >
>> > commit e2e407dc093f530b771ee8bf8fe1be41e3cea8b3
>> > Author: Matt Roper 
>> > Date:   Mon Feb 8 11:05:28 2016 -0800
>> >
>> > drm/i915: Pretend cursor is always on for ILK-style WM calculations 
>> > (v2)
>> >
>> > If so, it didn't help.  I'm currently doing a full rebuild just in
>> > case I messed something up, though.
>> >
>>
>> Definitely not fixed.  It seems to be okay after a reboot until the
>> first suspend/resume.
>>
>> This happened after resuming.  Five cents says it's the root cause.
>
> That's interesting, but doesn't ring a bell unfortunately. Can you try to
> attempt a bisect?
>

I'm giving up on my attempt to bisect for now.  After a bunch of false
starts to avoid this crap, I'm stuck at
651174a4a0ccaf41e14fadc4bc525d61ae7f7b18, which is based on 4.3-rc3
and doesn't merge cleanly up to 4.4.  It's also annoying because it
reproduces reasonably quickly but not instantaneously, and I can never
reproduce it before a suspend/resume, so my bisection attempts are
full of errors.

--Andy

> Thanks, Daniel
>
>>
>> [  160.361200] WARNING: CPU: 2 PID: 2512 at
>> drivers/gpu/drm/i915/intel_uncore.c:599
>> hsw_unclaimed_reg_debug+0x69/0x90 [i915]()
>> [  160.361209] Unclaimed register detected before writing to register 0x20a8
>> [  160.361213] Modules linked in: rfcomm fuse ccm cmac xt_CHECKSUM
>> ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
>> nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>> xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge stp llc
>> ebtables ip6table_raw ip6table_mangle ip6table_security ip6table_nat
>> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter
>> ip6_tables iptable_raw iptable_mangle iptable_security iptable_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bnep
>> arc4 iwlmvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek
>> hid_multitouch snd_hda_codec_generic iwlwifi snd_hda_intel intel_rapl
>> snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hwdep
>> cfg80211 snd_hda_core kvm snd_seq uvcvideo snd_seq_device
>> i2c_designware_platform
>> [  160.361385]  i2c_designware_core btusb snd_pcm videobuf2_vmalloc
>> wmi_mof vfat dell_wmi fat videobuf2_memops btrtl btbcm btintel
>> bluetooth dell_laptop dell_smbios dcdbas videobuf2_v4l2 snd_timer
>> videobuf2_core rtsx_pci_ms snd irqbypass videodev memstick
>> ghash_clmulni_intel joydev mei_me efi_pstore mei i2c_i801 soundcore
>> efivars pcspkr idma64 shpchp virt_dma media rfkill intel_lpss_pci
>> processor_thermal_device intel_soc_dts_iosf wmi acpi_als kfifo_buf
>> int3403_thermal tpm_tis industrialio pinctrl_sunrisepoint tpm
>> intel_hid int3400_thermal pinctrl_intel intel_lpss_acpi sparse_keymap
>> int340x_thermal_zone acpi_thermal_rel intel_lpss nfsd acpi_pad
>> auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc dm_crypt i915
>> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
>> fb_sys_fops drm rtsx_pci_sdmmc
>> [  160.361548]  mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel
>> rtsx_pci serio_raw i2c_hid video
>> [  160.361575] CPU: 2 PID: 2512 Comm: gnome-shell Not tainted
>> 4.5.0-rc4-acpi+ #59
>> [  160.361581] Hardware name: Dell Inc. XPS 13 9350/07TYC2, BIOS 1.1.9
>> 12/18/2015
>> [  160.361588]  0086 604232f7 88024d55ba60
>> 81449d83
>> [  160.361601]  88024d55baa8 a01e15e8 88024d55ba98
>> 81094252
>> [  160.361612]  88026f4d 20a8 88026f4d
>> fefe
>> [  160.361624] Call Trace:
>> [  160.361644]  [] dump_stack+0x65/0x92
>> [  160.361660]  [] warn_slowpath_common+0x82/0xc0
>> [  160.361671]  [] warn_slowpath_fmt+0x5c/0x80
>> [  160.361764]  [] hsw_unclaimed_reg_debug+0x69/0x90 

[Intel-gfx] Possible 4.5 i915 Skylake regression

2016-03-11 Thread Andy Lutomirski
On Mon, Feb 22, 2016 at 7:13 PM, Andy Lutomirski  wrote:
> On Wed, Feb 17, 2016 at 5:36 PM, Andy Lutomirski  
> wrote:
>> On Wed, Feb 17, 2016 at 8:18 AM, Daniel Vetter  wrote:
>>> On Tue, Feb 16, 2016 at 09:26:35AM -0800, Andy Lutomirski wrote:
 On Tue, Feb 16, 2016 at 9:12 AM, Andy Lutomirski  
 wrote:
 > On Tue, Feb 16, 2016 at 8:12 AM, Daniel Vetter  
 > wrote:
 >> On Mon, Feb 15, 2016 at 06:58:33AM -0800, Andy Lutomirski wrote:
 >>> On Sun, Feb 14, 2016 at 6:59 PM, Andy Lutomirski  
 >>> wrote:
 >>> > Hi-
 >>> >
 >>> > On 4.5-rc3 on a Dell XPS 13 9350 (Skylake i915, no nvidia on this
 >>> > model), shortly after resume, I saw a single black flash on the
 >>> > screen.  The log said:
 >>> >
 >>> > [Feb13 07:05] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] 
 >>> > *ERROR*
 >>> > CPU pipe A FIFO underrun
 >>> >
 >>> > I haven't seen this on 4.4.
 >>> >
 >>> > I'd be happy to dig up debugging info, but I don't know what would be
 >>> > useful.  I have no i915 module options set.
 >>>
 >>> It's flashing quite frequently now, although I seem to get the
 >>> underrun warning only once per resume.
 >>
 >> We shut up the warning irq source to avoid hijacking an entire cpu core
 >> ;-)
 >>
 >> There's a fix from Matt right after 4.5-rc4 in Linus' branch. I'm hoping
 >> that should help.
 >
 > Do you mean:
 >
 > commit e2e407dc093f530b771ee8bf8fe1be41e3cea8b3
 > Author: Matt Roper 
 > Date:   Mon Feb 8 11:05:28 2016 -0800
 >
 > drm/i915: Pretend cursor is always on for ILK-style WM calculations 
 > (v2)
 >
 > If so, it didn't help.  I'm currently doing a full rebuild just in
 > case I messed something up, though.
 >

 Definitely not fixed.  It seems to be okay after a reboot until the
 first suspend/resume.

 This happened after resuming.  Five cents says it's the root cause.
>>>
>>> That's interesting, but doesn't ring a bell unfortunately. Can you try to
>>> attempt a bisect?
>>
>> I probably can, but it's very slow.  Is there a reasonably
>> straightforward way to instrument the watermark computation to see
>> what's going wrong?  I'm reasonably confident that the bug is in the
>> resume code or in something that only happens on resume, since I still
>> haven't seen underruns after rebooting before suspending.
>>
>
> With some instrumentation applied, I got this:
>
> [  369.471064] skl_update_wm(crtc-0): computed update
> [  369.471072] skl_update_other_pipe_wm(crtc-0): no change
> [  369.471075] skl_write_wm_values...
> [  369.471078]  CRTC crtc-0 pipe A
> [  369.471083]   wm_linetime = 121
> [  369.471086]   plane_wm level 0 plane 0 = 2147500036
> [  369.471090]   plane_wm level 0 plane 1 = 0
> [  369.471094]   plane_wm level 0 cursor = 2147500036
> [  369.471097]   plane_wm level 1 plane 0 = 2147516439
> [  369.471101]   plane_wm level 1 plane 1 = 0
> [  369.471104]   plane_wm level 1 cursor = 2147516439
> [  369.471108]   plane_wm level 2 plane 0 = 2147516448
> [  369.47]   plane_wm level 2 plane 1 = 0
> [  369.471115]   plane_wm level 2 cursor = 0
> [  369.471118]   plane_wm level 3 plane 0 = 2147532837
> [  369.471121]   plane_wm level 3 plane 1 = 0
> [  369.471125]   plane_wm level 3 cursor = 0
> [  369.471128]   plane_wm level 4 plane 0 = 2147565639
> [  369.471131]   plane_wm level 4 plane 1 = 0
> [  369.471135]   plane_wm level 4 cursor = 0
> [  369.471138]   plane_wm level 5 plane 0 = 2147582038
> [  369.471141]   plane_wm level 5 plane 1 = 0
> [  369.471145]   plane_wm level 5 cursor = 0
> [  369.471148]   plane_wm level 6 plane 0 = 2147582044
> [  369.471151]   plane_wm level 6 plane 1 = 0
> [  369.471155]   plane_wm level 6 cursor = 0
> [  369.471158]   plane_wm level 7 plane 0 = 2147598443
> [  369.471161]   plane_wm level 7 plane 1 = 0
> [  369.471164]   plane_wm level 7 cursor = 0
> [  369.471168]   wm_trans plane 0 = 0
> [  369.471171]   wm_trans plane 1 = 0
> [  369.471174]   wm_trans cursor = 0
> [  369.471182]  CRTC crtc-1 pipe B
> [  369.471184]   clean
> [  369.471186]  CRTC crtc-2 pipe C
> [  369.471189]   clean
> [  369.471226] skl_update_wm(crtc-0): no update
> [  372.068755] [drm:intel_cpu_fifo_underrun_irq_handler [i915]]
> *ERROR* CPU pipe A FIFO underrun
> [  373.399396] skl_update_wm(crtc-0): computed update
> [  373.399408] skl_update_other_pipe_wm(crtc-0): no change
> [  373.399413] skl_write_wm_values...
> [  373.399418]  CRTC crtc-0 pipe A
> [  373.399426]   wm_linetime = 121
> [  373.399431]   plane_wm level 0 plane 0 = 2147500036
> [  373.399438]   plane_wm level 0 plane 1 = 0
> [  373.399443]   plane_wm level 0 cursor = 16388
> [  373.399449]   plane_wm level 1 plane 0 = 2147516439
> [  373.399455]   plane_wm level 1 plane 1 = 0
> [  373.399460]   plane_wm level 1 cursor = 32791
> [  373.399465]   plane_wm level 2 plane 0 = 2147516448
> [  373.399471]   plane_wm level 2 p