I conducted a few more tests to make sure there is no hardware problem:
- I ran a memtest68 all night long --> PASS
- I ran a filesystem check --> Fine
- I ran a read/write badblocks scan of my SSD --> No errors

After that I installed the nvidia 650 driver, rebooted and Xorg won't
start as result of kernel crash during boot.

After searching the log I found the following error:
BUG: kernel NULL pointer dereference, address: 0000000000000170

Which I think maybe related to:
https://forums.developer.nvidia.com/t/bug-report-kernel-null-pointer-dereference-on-nvidia-465-24-02-2-on-linux-5-11-15-when-starting-x11/175845

After the last line of the kernel log the system crashes.


------ Kernel crash log below -------
Jun 23 09:20:37 workstation kernel: [   20.049300] BUG: kernel NULL pointer 
dereference, address: 0000000000000170
Jun 23 09:20:37 workstation kernel: [   20.049303] #PF: supervisor read access 
in kernel mode
Jun 23 09:20:37 workstation kernel: [   20.049303] #PF: error_code(0x0000) - 
not-present page
Jun 23 09:20:37 workstation kernel: [   20.049304] PGD 0 P4D 0
Jun 23 09:20:37 workstation kernel: [   20.049307] Oops: 0000 [#1] SMP NOPTI
Jun 23 09:20:37 workstation kernel: [   20.049308] CPU: 2 PID: 2016 Comm: Xorg 
Tainted: P          IOE     5.10.36-051036-generic #202105111331
Jun 23 09:20:37 workstation kernel: [   20.049309] Hardware name: Dell Inc. 
Precision 5820 Tower X-Series/0X75JG, BIOS 2.2.0 04/14/2020
Jun 23 09:20:37 workstation kernel: [   20.049507] RIP: 
0010:_nv015534rm+0x1b6/0x330 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.049508] Code: 8b 87 68 05 00 00 ba 
01 00 00 00 be 02 00 00 00 e8 5f 9b 52 e4 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 
00 00 48 8b 45 10 44 89 ee <48> 8b b8 70 01 00 00 48 8b 87 d8 04 00 00 e8 37 9b 
52 e4 89 c3 48
Jun 23 09:20:37 workstation kernel: [   20.049510] RSP: 0018:ffff9ea901927938 
EFLAGS: 00010293
Jun 23 09:20:37 workstation kernel: [   20.049511] RAX: 0000000000000000 RBX: 
0000000000001000 RCX: 0000000000000005
Jun 23 09:20:37 workstation kernel: [   20.049511] RDX: 0000000000000004 RSI: 
0000000000000005 RDI: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.049511] RBP: ffff915efc4b2dd0 R08: 
0000000000000001 R09: ffff915efc4b2cb8
Jun 23 09:20:37 workstation kernel: [   20.049512] R10: ffff915eaa2e8008 R11: 
0000000010100000 R12: 0000000000001200
Jun 23 09:20:37 workstation kernel: [   20.049512] R13: 0000000000000005 R14: 
ffff915e88614010 R15: 0000000000004000
Jun 23 09:20:37 workstation kernel: [   20.049513] FS:  00007ff4bb911ec0(0000) 
GS:ffff916ddf880000(0000) knlGS:0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.049514] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Jun 23 09:20:37 workstation kernel: [   20.049514] CR2: 0000000000000170 CR3: 
0000000104b28003 CR4: 00000000003706e0
Jun 23 09:20:37 workstation kernel: [   20.049515] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.049515] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
Jun 23 09:20:37 workstation kernel: [   20.049516] Call Trace:
Jun 23 09:20:37 workstation kernel: [   20.049695]  ? _nv015556rm+0x7fd/0x1020 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.049846]  ? _nv027155rm+0x22c/0x4f0 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.049992]  ? _nv017787rm+0x303/0x5e0 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050146]  ? _nv017788rm+0x30/0xa0 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050291]  ? _nv017789rm+0xe1/0x220 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050458]  ? _nv022829rm+0xed/0x220 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050546]  ? _nv023065rm+0x30/0x60 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050649]  ? _nv000704rm+0x16da/0x22b0 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050753]  ? rm_init_adapter+0xc5/0xe0 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050825]  ? 
nv_open_device+0x122/0x8e0 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050897]  ? nvidia_open+0x2ad/0x550 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050971]  ? 
nvidia_frontend_open+0x58/0xa0 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.050973]  ? chrdev_open+0xf7/0x220
Jun 23 09:20:37 workstation kernel: [   20.050975]  ? cdev_device_add+0x90/0x90
Jun 23 09:20:37 workstation kernel: [   20.050976]  ? do_dentry_open+0x156/0x370
Jun 23 09:20:37 workstation kernel: [   20.050977]  ? vfs_open+0x2d/0x30
Jun 23 09:20:37 workstation kernel: [   20.050978]  ? do_open+0x1c3/0x340
Jun 23 09:20:37 workstation kernel: [   20.050979]  ? path_openat+0x10a/0x1d0
Jun 23 09:20:37 workstation kernel: [   20.050980]  ? do_filp_open+0x8c/0x130
Jun 23 09:20:37 workstation kernel: [   20.050981]  ? 
__check_object_size+0x1c/0x20
Jun 23 09:20:37 workstation kernel: [   20.050982]  ? do_sys_openat2+0x9b/0x150
Jun 23 09:20:37 workstation kernel: [   20.050983]  ? __x64_sys_openat+0x56/0x90
Jun 23 09:20:37 workstation kernel: [   20.050986]  ? do_syscall_64+0x38/0x90
Jun 23 09:20:37 workstation kernel: [   20.050988]  ? 
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 23 09:20:37 workstation kernel: [   20.050989] Modules linked in: 
binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common isst_if_common 
nvidia_uvm(POE) nfit x86_pkg_temp_thermal intel_powerclamp 
snd_hda_codec_realtek nvidia_drm(POE) snd_hda_codec_generic coretemp 
ledtrig_audio nvidia_modeset(POE) snd_hda_codec_hdmi snd_hda_intel 
snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence 
uvcvideo snd_hda_codec kvm_intel videobuf2_vmalloc snd_hda_core 
videobuf2_memops soundwire_bus nvidia(POE) kvm snd_soc_core videobuf2_v4l2 
dell_smm_hwmon snd_usb_audio snd_compress videobuf2_common snd_usbmidi_lib 
ac97_bus snd_pcm_dmaengine snd_hwdep videodev rapl snd_seq_midi snd_pcm 
snd_seq_midi_event mc snd_rawmidi dell_wmi intel_cstate drm_kms_helper snd_seq 
ucsi_ccg cec snd_seq_device dell_smbios typec_ucsi input_leds rc_core snd_timer 
dcdbas typec intel_wmi_thunderbolt fb_sys_fops sparse_keymap syscopyarea video 
serio_raw dell_wmi_descriptor wmi_bmof sysfillrect snd mei_me sysimgblt ioatdma 
ip6t_REJECT
Jun 23 09:20:37 workstation kernel: [   20.051019]  efi_pstore soundcore mei 
dca nf_reject_ipv6 xt_hl ip6t_rt acpi_tad mac_hid ipt_REJECT nf_reject_ipv4 
nft_chain_nat xt_REDIRECT xt_MASQUERADE nf_nat nft_limit xt_limit xt_addrtype 
xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat 
nft_counter sch_fq_codel overlay iptable_filter ip6table_filter ip6_tables 
nf_tables br_netfilter libcrc32c bridge stp llc nfnetlink arp_tables msr 
parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic usbhid 
hid uas usb_storage crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel crypto_simd cryptd nvme i2c_i801 glue_helper ahci nvme_core e1000e 
i2c_nvidia_gpu xhci_pci i2c_smbus libahci xhci_pci_renesas wmi
Jun 23 09:20:37 workstation kernel: [   20.051044] CR2: 0000000000000170
Jun 23 09:20:37 workstation kernel: [   20.051046] ---[ end trace 
b87a24c03d6bd018 ]---
un 23 09:20:37 workstation kernel: [   20.151598] RIP: 
0010:_nv015534rm+0x1b6/0x330 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.151600] Code: 8b 87 68 05 00 00 ba 
01 00 00 00 be 02 00 00 00 e8 5f 9b 52 e4 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 
00 00 48 8b 45 10 44 89 ee <48> 8b b8 70 01 00 00 48 8b 87 d8 04 00 00 e8 37 9b 
52 e4 89 c3 48
Jun 23 09:20:37 workstation kernel: [   20.151600] RSP: 0018:ffff9ea901927938 
EFLAGS: 00010293
Jun 23 09:20:37 workstation kernel: [   20.151601] RAX: 0000000000000000 RBX: 
0000000000001000 RCX: 0000000000000005
Jun 23 09:20:37 workstation kernel: [   20.151602] RDX: 0000000000000004 RSI: 
0000000000000005 RDI: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.151602] RBP: ffff915efc4b2dd0 R08: 
0000000000000001 R09: ffff915efc4b2cb8
Jun 23 09:20:37 workstation kernel: [   20.151603] R10: ffff915eaa2e8008 R11: 
0000000010100000 R12: 0000000000001200
Jun 23 09:20:37 workstation kernel: [   20.151603] R13: 0000000000000005 R14: 
ffff915e88614010 R15: 0000000000004000
Jun 23 09:20:37 workstation kernel: [   20.151604] FS:  00007ff4bb911ec0(0000) 
GS:ffff916ddf880000(0000) knlGS:0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.151605] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Jun 23 09:20:37 workstation kernel: [   20.151605] CR2: 0000000000000170 CR3: 
0000000104b28003 CR4: 00000000003706e0
Jun 23 09:20:37 workstation kernel: [   20.151606] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.151606] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
Jun 23 09:20:37 workstation kernel: [   20.152529] general protection fault, 
probably for non-canonical address 0xc483480f75000000: 0000 [#2] SMP NOPTI
Jun 23 09:20:37 workstation kernel: [   20.152531] CPU: 14 PID: 2016 Comm: Xorg 
Tainted: P      D   IOE     5.10.36-051036-generic #202105111331
Jun 23 09:20:37 workstation kernel: [   20.152532] Hardware name: Dell Inc. 
Precision 5820 Tower X-Series/0X75JG, BIOS 2.2.0 04/14/2020
Jun 23 09:20:37 workstation kernel: [   20.152681] RIP: 
0010:_nv009368rm+0x3c/0x340 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.152682] Code: 07 0f 1f 44 00 00 31 
d2 48 8b 07 48 85 c0 75 1a e9 a1 02 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 48 
10 48 85 c9 74 17 48 89 c8 <48> 39 30 77 ef 0f 83 29 02 00 00 48 8b 48 18 48 85 
c9 75 e9 48 89
Jun 23 09:20:37 workstation kernel: [   20.152683] RSP: 0018:ffff9ea901927d50 
EFLAGS: 00010086
Jun 23 09:20:37 workstation kernel: [   20.152684] RAX: c483480f75000000 RBX: 
ffff9ea901927d98 RCX: c483480f75000000
Jun 23 09:20:37 workstation kernel: [   20.152685] RDX: ffff9ea901927de8 RSI: 
00000000000007e0 RDI: ffffffffc49d4658
Jun 23 09:20:37 workstation kernel: [   20.152685] RBP: ffff915e86c2dff0 R08: 
0000000000000001 R09: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.152685] R10: 0000000000000001 R11: 
0000000000000200 R12: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.152686] R13: ffffffffc49d4e40 R14: 
ffff9ea901927e80 R15: ffffffffc49d1a80
Jun 23 09:20:37 workstation kernel: [   20.152687] FS:  0000000000000000(0000) 
GS:ffff916ddfb80000(0000) knlGS:0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.152687] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Jun 23 09:20:37 workstation kernel: [   20.152688] CR2: 000055adefa32020 CR3: 
00000004d4610004 CR4: 00000000003706e0
Jun 23 09:20:37 workstation kernel: [   20.152688] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.152689] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
Jun 23 09:20:37 workstation kernel: [   20.152689] Call Trace:
Jun 23 09:20:37 workstation kernel: [   20.152786]  ? _nv039616rm+0xdf/0x1e0 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.152915]  ? 
rm_cleanup_file_private+0x42/0x140 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.153005]  ? os_free_mem+0x22/0x30 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.153093]  ? nvidia_close+0x14c/0x310 
[nvidia]
Jun 23 09:20:37 workstation kernel: [   20.153182]  ? 
nvidia_frontend_close+0x2f/0x50 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.153185]  ? __fput+0xa9/0x260
Jun 23 09:20:37 workstation kernel: [   20.153186]  ? ____fput+0xe/0x10
Jun 23 09:20:37 workstation kernel: [   20.153188]  ? task_work_run+0x6d/0xa0
Jun 23 09:20:37 workstation kernel: [   20.153190]  ? do_exit+0x233/0x3e0
Jun 23 09:20:37 workstation kernel: [   20.153192]  ? 
rewind_stack_do_exit+0x17/0x20
Jun 23 09:20:37 workstation kernel: [   20.153193] Modules linked in: 
binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common isst_if_common 
nvidia_uvm(POE) nfit x86_pkg_temp_thermal intel_powerclamp 
snd_hda_codec_realtek nvidia_drm(POE) snd_hda_codec_generic coretemp 
ledtrig_audio nvidia_modeset(POE) snd_hda_codec_hdmi snd_hda_intel 
snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence 
uvcvideo snd_hda_codec kvm_intel videobuf2_vmalloc snd_hda_core 
videobuf2_memops soundwire_bus nvidia(POE) kvm snd_soc_core videobuf2_v4l2 
dell_smm_hwmon snd_usb_audio snd_compress videobuf2_common snd_usbmidi_lib 
ac97_bus snd_pcm_dmaengine snd_hwdep videodev rapl snd_seq_midi snd_pcm 
snd_seq_midi_event mc snd_rawmidi dell_wmi intel_cstate drm_kms_helper snd_seq 
ucsi_ccg cec snd_seq_device dell_smbios typec_ucsi input_leds rc_core snd_timer 
dcdbas typec intel_wmi_thunderbolt fb_sys_fops sparse_keymap syscopyarea video 
serio_raw dell_wmi_descriptor wmi_bmof sysfillrect snd mei_me sysimgblt ioatdma 
ip6t_REJECT
Jun 23 09:20:37 workstation kernel: [   20.153226]  efi_pstore soundcore mei 
dca nf_reject_ipv6 xt_hl ip6t_rt acpi_tad mac_hid ipt_REJECT nf_reject_ipv4 
nft_chain_nat xt_REDIRECT xt_MASQUERADE nf_nat nft_limit xt_limit xt_addrtype 
xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat 
nft_counter sch_fq_codel overlay iptable_filter ip6table_filter ip6_tables 
nf_tables br_netfilter libcrc32c bridge stp llc nfnetlink arp_tables msr 
parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic usbhid 
hid uas usb_storage crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel crypto_simd cryptd nvme i2c_i801 glue_helper ahci nvme_core e1000e 
i2c_nvidia_gpu xhci_pci i2c_smbus libahci xhci_pci_renesas wmi
Jun 23 09:20:37 workstation kernel: [   20.153255] ---[ end trace 
b87a24c03d6bd019 ]---
Jun 23 09:20:37 workstation kernel: [   20.162789] RIP: 
0010:_nv015534rm+0x1b6/0x330 [nvidia]
Jun 23 09:20:37 workstation kernel: [   20.162790] Code: 8b 87 68 05 00 00 ba 
01 00 00 00 be 02 00 00 00 e8 5f 9b 52 e4 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 
00 00 48 8b 45 10 44 89 ee <48> 8b b8 70 01 00 00 48 8b 87 d8 04 00 00 e8 37 9b 
52 e4 89 c3 48
Jun 23 09:20:37 workstation kernel: [   20.162791] RSP: 0018:ffff9ea901927938 
EFLAGS: 00010293
Jun 23 09:20:37 workstation kernel: [   20.162791] RAX: 0000000000000000 RBX: 
0000000000001000 RCX: 0000000000000005
Jun 23 09:20:37 workstation kernel: [   20.162792] RDX: 0000000000000004 RSI: 
0000000000000005 RDI: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.162792] RBP: ffff915efc4b2dd0 R08: 
0000000000000001 R09: ffff915efc4b2cb8
Jun 23 09:20:37 workstation kernel: [   20.162793] R10: ffff915eaa2e8008 R11: 
0000000010100000 R12: 0000000000001200
Jun 23 09:20:37 workstation kernel: [   20.162793] R13: 0000000000000005 R14: 
ffff915e88614010 R15: 0000000000004000
Jun 23 09:20:37 workstation kernel: [   20.162794] FS:  0000000000000000(0000) 
GS:ffff916ddfb80000(0000) knlGS:0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.162794] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Jun 23 09:20:37 workstation kernel: [   20.162795] CR2: 000055adefa32020 CR3: 
00000004d4610004 CR4: 00000000003706e0
Jun 23 09:20:37 workstation kernel: [   20.162795] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Jun 23 09:20:37 workstation kernel: [   20.162795] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
Jun 23 09:20:37 workstation kernel: [   20.162796] Fixing recursive fault but 
reboot is needed!
Jun 23 09:20:39 workstation kernel: [   21.742001] bpfilter: Loaded 
bpfilter_umh pid 2248
Jun 23 09:20:39 workstation kernel: [   21.836883] Initializing XFRM netlink 
socket
Jun 23 09:20:39 workstation kernel: [   22.113593] e1000e 0000:00:1f.6 eno1: 
NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Jun 23 09:20:39 workstation kernel: [   22.113657] IPv6: 
ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931239

Title:
  [Nvidia Quadro RTX 4000] System crash regularly

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931239/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to