Re: System with kernel 4.21 hang immediately after start (invalid opcode: 0000 [#1] SMP NOPTI)

2019-01-07 Thread Michel Dänzer
On 2019-01-07 4:21 p.m., Mikhail Gavrilov wrote:
> On Mon, 7 Jan 2019 at 15:09, Michel Dänzer  wrote:
>>
>> Is this reproducible with commit 674e78acae0d ("drm/amd/display: Add
>> fast path for cursor plane updates") + the patch above?
> 
> Yes. I am sure I am able reproduce issue:
> All new dmesg are attached.
> 
>>
>> If yes, is it not reproducible with commit fc42d47ce011 ("drm/amdgpu:
>> Enable GPU recovery by default for CI")? (Please test long enough to be
>> at least 99% sure before saying it's not)
> 
> Commits 674e78acae0d and fc42d47ce011 already present in my patched kernel.

I mean:

Run "git checkout 674e78acae0d" and apply the patch. Build the kernel
and boot it. If it reproduces the problem, un-apply the patch, run "git
checkout fc42d47ce011", build the kernel and boot it. Test until either
it reproduces the problem, or you're at least 99% sure it won't.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: System with kernel 4.21 hang immediately after start (invalid opcode: 0000 [#1] SMP NOPTI)

2019-01-07 Thread Michel Dänzer
On 2019-01-06 11:26 p.m., Mikhail Gavrilov wrote:
> On Sat, 5 Jan 2019 at 01:13, Mikhail Gavrilov
>  wrote:
>>
>> On Fri, 4 Jan 2019 at 23:02, Michel Dänzer  wrote:
>>>
>>> On 2019-01-04 4:24 p.m., Alex Deucher wrote:
 On Fri, Jan 4, 2019 at 9:52 AM Mikhail Gavrilov
  wrote:
>
> Hi folks I found that kernel 4.21 unworkable with AMD hardware.
> Also I am make kernel bisect and founded the blame commit.
> First bad commit: [674e78acae0dfb4beb56132e41cbae5b60f7d662]
> drm/amd/display: Add fast path for cursor plane updates

 IIRC, the issue only shows up with newer versions of the ddx.
>>>
>>> Wayland compositors seem to hit it as well, I think Mikhail is using
>>> GNOME on Wayland.
>>
>> Exactly!
> 
> Even after applying patch
> https://cgit.freedesktop.org/~agd5f/linux/patch/?id=77acd1cd912987ffd62dad6a09275a1fb406f0c2
> SMP NOPTI occurs

Is this reproducible with commit 674e78acae0d ("drm/amd/display: Add
fast path for cursor plane updates") + the patch above?

If yes, is it not reproducible with commit fc42d47ce011 ("drm/amdgpu:
Enable GPU recovery by default for CI")? (Please test long enough to be
at least 99% sure before saying it's not)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: System with kernel 4.21 hang immediately after start (invalid opcode: 0000 [#1] SMP NOPTI)

2019-01-04 Thread Mikhail Gavrilov
On Fri, 4 Jan 2019 at 23:02, Michel Dänzer  wrote:
>
> On 2019-01-04 4:24 p.m., Alex Deucher wrote:
> > On Fri, Jan 4, 2019 at 9:52 AM Mikhail Gavrilov
> >  wrote:
> >>
> >> Hi folks I found that kernel 4.21 unworkable with AMD hardware.
> >> Also I am make kernel bisect and founded the blame commit.
> >> First bad commit: [674e78acae0dfb4beb56132e41cbae5b60f7d662]
> >> drm/amd/display: Add fast path for cursor plane updates
> >
> > IIRC, the issue only shows up with newer versions of the ddx.
>
> Wayland compositors seem to hit it as well, I think Mikhail is using
> GNOME on Wayland.

Exactly!

--
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: System with kernel 4.21 hang immediately after start (invalid opcode: 0000 [#1] SMP NOPTI)

2019-01-04 Thread Michel Dänzer
On 2019-01-04 4:24 p.m., Alex Deucher wrote:
> On Fri, Jan 4, 2019 at 9:52 AM Mikhail Gavrilov
>  wrote:
>>
>> Hi folks I found that kernel 4.21 unworkable with AMD hardware.
>> Also I am make kernel bisect and founded the blame commit.
>> First bad commit: [674e78acae0dfb4beb56132e41cbae5b60f7d662]
>> drm/amd/display: Add fast path for cursor plane updates
> 
> IIRC, the issue only shows up with newer versions of the ddx.

Wayland compositors seem to hit it as well, I think Mikhail is using
GNOME on Wayland.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: System with kernel 4.21 hang immediately after start (invalid opcode: 0000 [#1] SMP NOPTI)

2019-01-04 Thread Alex Deucher
On Fri, Jan 4, 2019 at 9:52 AM Mikhail Gavrilov
 wrote:
>
> Hi folks I found that kernel 4.21 unworkable with AMD hardware.
> Also I am make kernel bisect and founded the blame commit.
> First bad commit: [674e78acae0dfb4beb56132e41cbae5b60f7d662]
> drm/amd/display: Add fast path for cursor plane updates

IIRC, the issue only shows up with newer versions of the ddx.  Should
be fixed with this patch:
https://cgit.freedesktop.org/drm/drm/commit/?id=77acd1cd912987ffd62dad6a09275a1fb406f0c2

Alex

>
>
> [   41.841318] [ cut here ]
> [   41.841320] kernel BUG at lib/list_debug.c:53!
> [   41.841324] invalid opcode:  [#1] SMP NOPTI
> [   41.841328] CPU: 12 PID: 2149 Comm: gnome-shell Tainted: G
> C4.20.0-rc3-bisect-674e78acae0dfb4beb56132e41cbae5b60f7d662+
> #14
> [   41.841330] Hardware name: System manufacturer System Product
> Name/ROG STRIX X470-I GAMING, BIOS 1103 11/16/2018
> [   41.841335] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x47
> [   41.841337] Code: 1d 11 be e8 39 cf c8 ff 0f 0b 48 c7 c7 f0 1d 11
> be e8 2b cf c8 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 b0 1d 11 be e8 17
> cf c8 ff <0f> 0b 48 89 fe 48 c7 c7 78 1d 11 be e8 06 cf c8 ff 0f 0b 48
> 89 d1
> [   41.841339] RSP: 0018:ad8a43b67aa8 EFLAGS: 00010246
> [   41.841342] RAX: 0054 RBX: 8f992aea0850 RCX: 
> 
> [   41.841343] RDX:  RSI: 8f993eb168c8 RDI: 
> 8f993eb168c8
> [   41.841345] RBP: 8f992aea08f8 R08: 0005 R09: 
> 0007
> [   41.841346] R10:  R11: be97b18d R12: 
> 8f992aea0800
> [   41.841348] R13: 8f9927b0cfc0 R14: 8f992aea0850 R15: 
> 
> [   41.841350] FS:  7f2760663d00() GS:8f993eb0()
> knlGS:
> [   41.841351] CS:  0010 DS:  ES:  CR0: 80050033
> [   41.841353] CR2: 7f26f45f7000 CR3: 0007daafa000 CR4: 
> 003406e0
> [   41.841354] Call Trace:
> [   41.841363]  ttm_bo_del_from_lru+0x73/0xb0 [ttm]
> [   41.841369]  ttm_bo_del_sub_from_lru+0x22/0x30 [ttm]
> [   41.841444]  dm_plane_helper_prepare_fb+0x8e/0x2f0 [amdgpu]
> [   41.841454]  drm_atomic_helper_prepare_planes+0x4f/0xd0 [drm_kms_helper]
> [   41.841462]  drm_atomic_helper_commit+0x1c/0x110 [drm_kms_helper]
> [   41.841470]  drm_atomic_helper_update_plane+0xf0/0x110 [drm_kms_helper]
> [   41.841486]  drm_mode_cursor_universal+0x128/0x240 [drm]
> [   41.841501]  drm_mode_cursor_common+0x190/0x200 [drm]
> [   41.841516]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
> [   41.841528]  drm_ioctl_kernel+0xa9/0xf0 [drm]
> [   41.841542]  drm_ioctl+0x1f6/0x370 [drm]
> [   41.841556]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
> [   41.841560]  ? __handle_mm_fault+0xfc1/0x1590
> [   41.841612]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> [   41.841617]  do_vfs_ioctl+0xa4/0x630
> [   41.841620]  ksys_ioctl+0x60/0x90
> [   41.841623]  __x64_sys_ioctl+0x16/0x20
> [   41.841626]  do_syscall_64+0x5b/0x160
> [   41.841630]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   41.841633] RIP: 0033:0x7f27641bd2fb
> [   41.841635] Code: 0f 1e fa 48 8b 05 8d 9b 0c 00 64 c7 00 26 00 00
> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 9b 0c 00 f7 d8 64 89
> 01 48
> [   41.841637] RSP: 002b:7ffc4eeac468 EFLAGS: 0246 ORIG_RAX:
> 0010
> [   41.841639] RAX: ffda RBX: 55624f2e6520 RCX: 
> 7f27641bd2fb
> [   41.841640] RDX: 7ffc4eeac4a0 RSI: c02464bb RDI: 
> 000a
> [   41.841642] RBP: 7ffc4eeac4a0 R08: 0080 R09: 
> 0006
> [   41.841643] R10: 0001 R11: 0246 R12: 
> c02464bb
> [   41.841644] R13: 000a R14: 55624f2b65c0 R15: 
> 000a
> [   41.841646] Modules linked in: nls_utf8 isofs fuse rfcomm
> xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc devlink
> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
> ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
> nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink
> ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc
> xfs vfat fat libcrc32c arc4 edac_mce_amd r8822be(C) kvm_amd kvm
> irqbypass mac80211 snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_codec_hdmi snd_usb_audio snd_hda_intel btusb btrtl
> snd_hda_codec btbcm btintel snd_hda_core bluetooth snd_usbmidi_lib
> cfg80211 snd_rawmidi snd_hwdep snd_seq crct10dif_pclmul snd_seq_device
> snd_pcm crc32_pclmul ghash_clmulni_intel eeepc_wmi asus_wmi
> sparse_keymap snd_timer joydev video snd wmi_bmof k10temp sp5100_tco
> pcspkr i2c_piix4 ecdh_generic ccp soundcore
> [   41.841677]  rfkill gpio_amdpt gpio_generic pcc_cpufreq
> acpi_cpufreq binfmt_misc uas usb_storage amdgpu chash 

System with kernel 4.21 hang immediately after start (invalid opcode: 0000 [#1] SMP NOPTI)

2019-01-04 Thread Mikhail Gavrilov
Hi folks I found that kernel 4.21 unworkable with AMD hardware.
Also I am make kernel bisect and founded the blame commit.
First bad commit: [674e78acae0dfb4beb56132e41cbae5b60f7d662]
drm/amd/display: Add fast path for cursor plane updates


[   41.841318] [ cut here ]
[   41.841320] kernel BUG at lib/list_debug.c:53!
[   41.841324] invalid opcode:  [#1] SMP NOPTI
[   41.841328] CPU: 12 PID: 2149 Comm: gnome-shell Tainted: G
C4.20.0-rc3-bisect-674e78acae0dfb4beb56132e41cbae5b60f7d662+
#14
[   41.841330] Hardware name: System manufacturer System Product
Name/ROG STRIX X470-I GAMING, BIOS 1103 11/16/2018
[   41.841335] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x47
[   41.841337] Code: 1d 11 be e8 39 cf c8 ff 0f 0b 48 c7 c7 f0 1d 11
be e8 2b cf c8 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 b0 1d 11 be e8 17
cf c8 ff <0f> 0b 48 89 fe 48 c7 c7 78 1d 11 be e8 06 cf c8 ff 0f 0b 48
89 d1
[   41.841339] RSP: 0018:ad8a43b67aa8 EFLAGS: 00010246
[   41.841342] RAX: 0054 RBX: 8f992aea0850 RCX: 
[   41.841343] RDX:  RSI: 8f993eb168c8 RDI: 8f993eb168c8
[   41.841345] RBP: 8f992aea08f8 R08: 0005 R09: 0007
[   41.841346] R10:  R11: be97b18d R12: 8f992aea0800
[   41.841348] R13: 8f9927b0cfc0 R14: 8f992aea0850 R15: 
[   41.841350] FS:  7f2760663d00() GS:8f993eb0()
knlGS:
[   41.841351] CS:  0010 DS:  ES:  CR0: 80050033
[   41.841353] CR2: 7f26f45f7000 CR3: 0007daafa000 CR4: 003406e0
[   41.841354] Call Trace:
[   41.841363]  ttm_bo_del_from_lru+0x73/0xb0 [ttm]
[   41.841369]  ttm_bo_del_sub_from_lru+0x22/0x30 [ttm]
[   41.841444]  dm_plane_helper_prepare_fb+0x8e/0x2f0 [amdgpu]
[   41.841454]  drm_atomic_helper_prepare_planes+0x4f/0xd0 [drm_kms_helper]
[   41.841462]  drm_atomic_helper_commit+0x1c/0x110 [drm_kms_helper]
[   41.841470]  drm_atomic_helper_update_plane+0xf0/0x110 [drm_kms_helper]
[   41.841486]  drm_mode_cursor_universal+0x128/0x240 [drm]
[   41.841501]  drm_mode_cursor_common+0x190/0x200 [drm]
[   41.841516]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[   41.841528]  drm_ioctl_kernel+0xa9/0xf0 [drm]
[   41.841542]  drm_ioctl+0x1f6/0x370 [drm]
[   41.841556]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[   41.841560]  ? __handle_mm_fault+0xfc1/0x1590
[   41.841612]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[   41.841617]  do_vfs_ioctl+0xa4/0x630
[   41.841620]  ksys_ioctl+0x60/0x90
[   41.841623]  __x64_sys_ioctl+0x16/0x20
[   41.841626]  do_syscall_64+0x5b/0x160
[   41.841630]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   41.841633] RIP: 0033:0x7f27641bd2fb
[   41.841635] Code: 0f 1e fa 48 8b 05 8d 9b 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 9b 0c 00 f7 d8 64 89
01 48
[   41.841637] RSP: 002b:7ffc4eeac468 EFLAGS: 0246 ORIG_RAX:
0010
[   41.841639] RAX: ffda RBX: 55624f2e6520 RCX: 7f27641bd2fb
[   41.841640] RDX: 7ffc4eeac4a0 RSI: c02464bb RDI: 000a
[   41.841642] RBP: 7ffc4eeac4a0 R08: 0080 R09: 0006
[   41.841643] R10: 0001 R11: 0246 R12: c02464bb
[   41.841644] R13: 000a R14: 55624f2b65c0 R15: 000a
[   41.841646] Modules linked in: nls_utf8 isofs fuse rfcomm
xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc devlink
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc
xfs vfat fat libcrc32c arc4 edac_mce_amd r8822be(C) kvm_amd kvm
irqbypass mac80211 snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi snd_usb_audio snd_hda_intel btusb btrtl
snd_hda_codec btbcm btintel snd_hda_core bluetooth snd_usbmidi_lib
cfg80211 snd_rawmidi snd_hwdep snd_seq crct10dif_pclmul snd_seq_device
snd_pcm crc32_pclmul ghash_clmulni_intel eeepc_wmi asus_wmi
sparse_keymap snd_timer joydev video snd wmi_bmof k10temp sp5100_tco
pcspkr i2c_piix4 ecdh_generic ccp soundcore
[   41.841677]  rfkill gpio_amdpt gpio_generic pcc_cpufreq
acpi_cpufreq binfmt_misc uas usb_storage amdgpu chash amd_iommu_v2
hid_logitech_hidpp gpu_sched ttm drm_kms_helper drm igb
hid_logitech_dj dca crc32c_intel nvme i2c_algo_bit nvme_core wmi
pinctrl_amd hid_sony ff_memless
[   41.841690] ---[ end trace 1a4efe8e8abb5ae5 ]---




$ git bisect log
git bisect start
# good: [8fe28cb58bcb235034b64cbbb7550a8a43fd88be] Linux 4.20
git bisect good 8fe28cb58bcb235034b64cbbb7550a8a43fd88be
# bad: [a5f2bd479f58f171a16a9a4f3b4e748ab3057c0f] Merge branch