Bug#891056: Reproducible Intel GPU hang on rcs0 with ecode 9:0:0x86dffffd

2018-02-25 Thread Ben Caradoc-Davies
This bug is also seen in vanilla 4.16.0-rc2 from 3664ce2d930983966d on 
torvalds/master, but with this kernel, the gpu is reset without crashing 
X. After hanging for a few seconds, X recovers.


Kind regards,

--
Ben Caradoc-Davies 
Director
Transient Software Limited 
New Zealand



Bug#891056: Reproducible Intel GPU hang on rcs0 with ecode 9:0:0x86dffffd

2018-02-21 Thread Ben Caradoc-Davies
Package: src:linux
Version: 4.15.4-1
Severity: normal

Dear Maintainer,

I have a reproducible GPU hang when opening one particular workspace with
Eclipse Oxygen.2 (eclipse-jee-oxygen-2-linux-gtk-x86_64), *with* SWT_GTK3=0 (to
use GTK2), and *with* xfwm4 compositing enabled. With GTK3 or without
compositing, the hang does not occur. The hang occurs immediately and every
time with this workspace. Other workspaces are not affected. I do not think
there is anything special about this workspace. Just a timing issue I guess.

Platform is an Intel i7 7700 using integrated HD Graphics 630 and the built in
kernel modesetting driver (xserver-xorg-video-intel is *not* installed).

Hang reproduced with kernels 4.14.7-1, 4.14.13-1, 4.14.17-1, and 4.15.4-1. I
first noticed the problem after recent upgrades of libgl*-mesa in sid, most
recently to 17.3.5-1. I had noticed this hang on one single occasion with
earlier libgl*-mesa.

Feb 22 13:55:28 ripley kernel: [drm] GPU HANG: ecode 9:0:0x86dd, in Xorg
[800], reason: Hang on rcs0, action: reset
Feb 22 13:55:28 ripley kernel: [drm] GPU hangs can indicate a bug anywhere in
the entire gfx stack, including userspace.
Feb 22 13:55:28 ripley kernel: [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
Feb 22 13:55:28 ripley kernel: [drm] drm/i915 developers can then reassign to
the right component if it's not a kernel issue.
Feb 22 13:55:28 ripley kernel: [drm] The gpu crash dump is required to analyze
gpu hangs, so please always attach it.
Feb 22 13:55:28 ripley kernel: [drm] GPU crash dump saved to
/sys/class/drm/card0/error
Feb 22 13:55:28 ripley kernel: i915 :00:02.0: Resetting rcs0 after gpu hang
Feb 22 13:55:36 ripley kernel: i915 :00:02.0: Resetting rcs0 after gpu hang
Feb 22 13:55:44 ripley kernel: i915 :00:02.0: Resetting rcs0 after gpu hang
Feb 22 13:55:52 ripley kernel: i915 :00:02.0: Resetting rcs0 after gpu hang
Feb 22 13:56:00 ripley kernel: i915 :00:02.0: Resetting rcs0 after gpu hang

(But I am following Debian instructions to not submit upstream without asking
first.)

GPU crash dump attached as requested.

Workarounds: use SWT_GTK3=1 (or unset), or turn off xfwm4 compositing.

Kind regards,
Ben.



-- Package-specific info:
** Version:
Linux version 4.15.0-1-amd64 (debian-ker...@lists.debian.org) (gcc version 
7.3.0 (Debian 7.3.0-3)) #1 SMP Debian 4.15.4-1 (2018-02-18)

** Command line:
BOOT_IMAGE=/vmlinuz-4.15.0-1-amd64 root=/dev/mapper/vg-root ro quiet 
net.ifnames=0 apparmor=0 splash

** Tainted: W (512)
 * Taint on warning.

** Kernel log:
Unable to read kernel log; any relevant messages should be attached

** Model information
sys_vendor: System manufacturer
product_name: System Product Name
product_version: System Version
chassis_vendor: Default string
chassis_version: Default string
bios_vendor: American Megatrends Inc.
bios_version: 3601
board_vendor: ASUSTeK COMPUTER INC.
board_name: H110I-PLUS
board_version: Rev X.0x

** Loaded modules:
ctr
ccm
arc4
ip6t_REJECT
nf_reject_ipv6
nf_conntrack_ipv6
nf_defrag_ipv6
ip6table_filter
ip6_tables
ipt_REJECT
nf_reject_ipv4
xt_tcpudp
nf_conntrack_ipv4
nf_defrag_ipv4
xt_conntrack
nf_conntrack
iptable_filter
snd_hda_codec_hdmi
binfmt_misc
snd_hda_codec_realtek
snd_hda_codec_generic
nls_ascii
nls_cp437
vfat
ath9k_htc
fat
intel_rapl
ath9k_common
ath9k_hw
x86_pkg_temp_thermal
intel_powerclamp
coretemp
snd_hda_intel
ath
kvm_intel
mac80211
snd_hda_codec
kvm
irqbypass
intel_cstate
snd_hda_core
cfg80211
eeepc_wmi
snd_hwdep
efi_pstore
intel_uncore
asus_wmi
sparse_keymap
joydev
intel_rapl_perf
efivars
snd_pcm
rfkill
wmi_bmof
snd_timer
sg
mei_me
pcspkr
iTCO_wdt
snd
iTCO_vendor_support
mei
soundcore
shpchp
evdev
acpi_pad
parport_pc
ppdev
lp
parport
efivarfs
ip_tables
x_tables
autofs4
ext4
crc16
mbcache
jbd2
fscrypto
ecb
btrfs
zstd_decompress
zstd_compress
xxhash
algif_skcipher
af_alg
dm_crypt
dm_mod
hid_generic
usbhid
hid
raid10
raid456
async_raid6_recov
async_memcpy
async_pq
async_xor
async_tx
xor
raid6_pq
libcrc32c
crc32c_generic
raid1
raid0
multipath
linear
md_mod
sd_mod
crct10dif_pclmul
crc32_pclmul
crc32c_intel
ghash_clmulni_intel
mxm_wmi
pcbc
i915
aesni_intel
ahci
i2c_algo_bit
aes_x86_64
libahci
crypto_simd
glue_helper
xhci_pci
drm_kms_helper
cryptd
xhci_hcd
libata
r8169
i2c_i801
mii
usbcore
scsi_mod
drm
usb_common
fan
thermal
wmi
video
button

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Intel Kaby Lake Host Bridge 
[8086:591f] (rev 05)
Subsystem: ASUSTeK Computer Inc. Intel Kaby Lake Host Bridge [1043:8694]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 
[8086:5912] (rev 04) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. HD Graphics 630 [1043:8694]
Control: I/O+ Mem+ BusMaster+ SpecCycle-