[Ubuntu-x-swat] [Bug 2028199] Re: GPU disconnect, then minutes later Xorg display lockup
It sounds like you've hit the problem Nvidia mentions in their documentation: http://us.download.nvidia.com/XFree86/Linux-x86_64/535.54.03/README/egpu.html so I guess falling off the bus is just as bad as being unplugged. Can you provide logs mentioning falling off the bus? I only recall a different user mentioning it recently in bug 2023585. ** Package changed: xorg (Ubuntu) => nvidia-graphics-drivers-535 (Ubuntu) ** Changed in: nvidia-graphics-drivers-535 (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu-X, which is subscribed to xorg in Ubuntu. https://bugs.launchpad.net/bugs/2028199 Title: GPU disconnect, then minutes later Xorg display lockup To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2028199/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-x-swat Post to : ubuntu-x-swat@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-x-swat More help : https://help.launchpad.net/ListHelp
[Ubuntu-x-swat] [Bug 2028193] Re: Machine doesn't resume after sleep
** Tags added: amdgpu resume suspend-resume ** Package changed: xorg (Ubuntu) => linux (Ubuntu) -- You received this bug notification because you are a member of Ubuntu-X, which is subscribed to xorg in Ubuntu. https://bugs.launchpad.net/bugs/2028193 Title: Machine doesn't resume after sleep To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2028193/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-x-swat Post to : ubuntu-x-swat@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-x-swat More help : https://help.launchpad.net/ListHelp
[Ubuntu-x-swat] [Bug 2028193] Re: Machine doesn't resume after sleep
** Package changed: ubuntu => xorg (Ubuntu) -- You received this bug notification because you are a member of Ubuntu-X, which is subscribed to xorg in Ubuntu. https://bugs.launchpad.net/bugs/2028193 Title: Machine doesn't resume after sleep To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/2028193/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-x-swat Post to : ubuntu-x-swat@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-x-swat More help : https://help.launchpad.net/ListHelp
[Ubuntu-x-swat] [Bug 2028199] [NEW] GPU disconnect, then minutes later Xorg display lockup
Public bug reported: It appears that occassionally my external GPU 'falls off' the bus, it's listed as not primary, so should not be an issue. However, a few minutes (usually) after this occurs, the Xorg server or window manager completely locks up in strange ways. For example - the mouse can move across the multiple monitors, but no clicks are registered. similarly, no keyboard keypresses result in any change in the UI. However, applications happily run in the background - videoconference meetings using the audio mic and speakers continue to operate , and the machine stays online. This leads me to believe there is some problem with Xorg capturing the Human interface devices and passing these to the correct applications, and then updating the displays. I don't know how to trigger this behavior reliably, but I am getting closer to tracking down how it occurs. I hoped you might be able to shed some light as to why this might be happening and how to resolve it. ProblemType: Bug DistroRelease: Ubuntu 23.04 Package: xorg 1:7.7+23ubuntu2 ProcVersionSignature: Ubuntu 6.2.0-25.25-generic 6.2.13 Uname: Linux 6.2.0-25-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia .proc.driver.nvidia.capabilities.gpu0: Error: path was not a regular file. .proc.driver.nvidia.capabilities.mig: Error: path was not a regular file. .proc.driver.nvidia.gpus..52.00.0: Error: path was not a regular file. .proc.driver.nvidia.registry: Binary: "" .proc.driver.nvidia.suspend: suspend hibernate resume .proc.driver.nvidia.suspend_depth: default modeset uvm .proc.driver.nvidia.version: NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.54.03 Tue Jun 6 22:20:39 UTC 2023 GCC version: gcc version 12.2.0 (Ubuntu 12.2.0-17ubuntu1) ApportVersion: 2.26.1-0ubuntu2 Architecture: amd64 CasperMD5CheckResult: pass CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins' CompositorRunning: None CurrentDesktop: ubuntu:GNOME Date: Wed Jul 19 14:33:01 2023 DistUpgraded: 2023-04-26 15:38:23,675 DEBUG Running PostInstallScript: '/usr/lib/ubuntu-advantage/upgrade_lts_contract.py' DistroCodename: lunar DistroVariant: ubuntu ExtraDebuggingInterest: Yes GraphicsCard: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] [8086:9a49] (rev 01) (prog-if 00 [VGA controller]) Subsystem: Lenovo TigerLake-LP GT2 [Iris Xe Graphics] [17aa:22d4] NVIDIA Corporation TU117 [GeForce GTX 1650] [10de:1f82] (rev a1) (prog-if 00 [VGA controller]) Subsystem: eVga.com. Corp. TU117 [GeForce GTX 1650] [3842:1257] InstallationDate: Installed on 2023-01-09 (191 days ago) InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809.1) MachineType: LENOVO 20XY0027US ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-25-generic root=UUID=1cae8af8-977f-4853-9106-9169f34c4bc2 ro quiet splash vt.handoff=7 SourcePackage: xorg UpgradeStatus: Upgraded to lunar on 2023-04-26 (84 days ago) dmi.bios.date: 06/12/2023 dmi.bios.release: 1.61 dmi.bios.vendor: LENOVO dmi.bios.version: N32ET85W (1.61 ) dmi.board.asset.tag: Not Available dmi.board.name: 20XY0027US dmi.board.vendor: LENOVO dmi.board.version: SDK0J40697 WIN dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 31 dmi.chassis.vendor: LENOVO dmi.chassis.version: None dmi.ec.firmware.release: 1.34 dmi.modalias: dmi:bvnLENOVO:bvrN32ET85W(1.61):bd06/12/2023:br1.61:efr1.34:svnLENOVO:pn20XY0027US:pvrThinkPadX1YogaGen6:rvnLENOVO:rn20XY0027US:rvrSDK0J40697WIN:cvnLENOVO:ct31:cvrNone:skuLENOVO_MT_20XY_BU_Think_FM_ThinkPadX1YogaGen6: dmi.product.family: ThinkPad X1 Yoga Gen 6 dmi.product.name: 20XY0027US dmi.product.sku: LENOVO_MT_20XY_BU_Think_FM_ThinkPad X1 Yoga Gen 6 dmi.product.version: ThinkPad X1 Yoga Gen 6 dmi.sys.vendor: LENOVO version.compiz: compiz N/A version.libdrm2: libdrm2 2.4.114-1 version.libgl1-mesa-dri: libgl1-mesa-dri 23.0.4-0ubuntu1~23.04.1 version.libgl1-mesa-glx: libgl1-mesa-glx 23.0.4-0ubuntu1~23.04.1 version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A version.xserver-xorg-core: xserver-xorg-core 2:21.1.7-1ubuntu3 version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-3 version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1 version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1 ** Affects: xorg (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug lunar ubuntu -- You received this bug notification because you are a member of Ubuntu-X, which is subscribed to xorg in Ubuntu. https://bugs.launchpad.net/bugs/2028199 Title: GPU disconnect, then minutes later Xorg display lockup To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/2028199/+subscriptions ___ Mailing list:
[Ubuntu-x-swat] [Bug 2028165] Re: nvidia-dkms-* FTBS with linux 6.5
** Tags added: patch -- You received this bug notification because you are a member of Ubuntu-X, which is subscribed to nvidia-graphics-drivers-390 in Ubuntu. https://bugs.launchpad.net/bugs/2028165 Title: nvidia-dkms-* FTBS with linux 6.5 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2028165/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-x-swat Post to : ubuntu-x-swat@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-x-swat More help : https://help.launchpad.net/ListHelp
[Ubuntu-x-swat] [Bug 2028165] Re: nvidia-dkms-* FTBS with linux 6.5
** Patch added: "nvidia-graphics-drivers-390_390.157-0ubuntu8.debdiff" https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2028165/+attachment/5687148/+files/nvidia-graphics-drivers-390_390.157-0ubuntu8.debdiff -- You received this bug notification because you are a member of Ubuntu-X, which is subscribed to nvidia-graphics-drivers-390 in Ubuntu. https://bugs.launchpad.net/bugs/2028165 Title: nvidia-dkms-* FTBS with linux 6.5 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2028165/+subscriptions ___ Mailing list: https://launchpad.net/~ubuntu-x-swat Post to : ubuntu-x-swat@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-x-swat More help : https://help.launchpad.net/ListHelp
[Ubuntu-x-swat] [Bug 2028165] [NEW] nvidia-dkms-* FTBS with linux 6.5
Public bug reported: [Impact] ... In file included from /var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21, from /var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13: /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function ‘NV_GET_USER_PAGES_REMOTE’: /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types] 164 |return get_user_pages_remote(tsk, mm, start, nr_pages, flags, | ^~~ | | | struct task_struct * ... [Fix] Apply the attached fix. [How to test] Install (and build) the patched packet. [Regression potential] The fix is composed of two patches: 1) the first patch simply garbage collect a reference to a function that was never used but that had the API changed in Linux 6.5 - so, it's a trivial change. 2) the second patch actually reimplement part of the vma scanning that was removed in __get_user_pages_locked() in upstream commit b2cac248191b7466c5819e0da617b0705a26e197 "mm/gup: removed vmas array from internal GUP functions" - here is where most likely any regression could be found. ** Affects: nvidia-graphics-drivers-390 (Ubuntu) Importance: Undecided Status: New ** Affects: nvidia-graphics-drivers-390 (Ubuntu Mantic) Importance: Undecided Status: New ** Description changed: [Impact] + + In file included from /var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21, + from /var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13: + /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function ‘NV_GET_USER_PAGES_REMOTE’: + /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types] + 164 |return get_user_pages_remote(tsk, mm, start, nr_pages, flags, + | ^~~ + | | + | struct task_struct * + [Fix] + Apply the attached fix. + [How to test] + Install (and build) the patched packet. + [Regression potential] + + The fix is composed of two patches: ** Description changed: [Impact] In file included from /var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21, - from /var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13: + from /var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13: /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function ‘NV_GET_USER_PAGES_REMOTE’: /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types] - 164 |return get_user_pages_remote(tsk, mm, start, nr_pages, flags, - | ^~~ - | | - | struct task_struct * - + 164 |return get_user_pages_remote(tsk, mm, start, nr_pages, flags, + | ^~~ + | | + | struct task_struct * [Fix] Apply the attached fix. [How to test] Install (and build) the patched packet. [Regression potential] The fix is composed of two patches: + + 1) the first patch simply garbage collect a reference to a function that + was never used but that had the API changed in Linux 6.5 - so, it's a + trivial change. + + 2) the second patch actually reimplement part of the vma scanning that was removed in __get_user_pages_locked() in upstream commit b2cac248191b7466c5819e0da617b0705a26e197 "mm/gup: removed vmas + array from internal GUP functions" - here is where most likely any regression could be found. ** Also affects: nvidia-graphics-drivers-390 (Ubuntu Mantic) Importance: Undecided Status: New ** Description changed: [Impact] + ... In file included from /var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21, from /var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13: /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function ‘NV_GET_USER_PAGES_REMOTE’: /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types] 164 |return get_user_pages_remote(tsk, mm, start, nr_pages, flags, |
[Ubuntu-x-swat] [Bug 2016459] Re: NVRM: RmInitAdapter failed! , failed to copy vbios to system memory
Hi, some progress. running: # uname -a Linux T430-i7 6.2.0-24-generic #24-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 16 12:03:50 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux to investigate why Xorg stopped working (hanging screen/keyboard) In short, same libnvidia-tls issue and workarond as posted before. solution: ``` /etc/ld.so.conf.d# cat nvidia.conf # 2023/05/01 ArjanF https://bbs.archlinux.org/viewtopic.php?id=283327=2 /usr/lib/x86_64-linux-gnu/tls/ ``` and perform ldconfig, reboot into 6.2.0-24 (as 6.2.0-25 has vbios copy error issue) Then Xorg starts with nvidia support. some details on the libnvidia-tls on my system ``` $ file $(find /usr/lib/x86_64-linux-gnu/ -name "libnvidia-tls*") /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.390.157: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, for GNU/Linux 2.2.5, stripped /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.390.157: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, for GNU/Linux 2.3.99, stripped $ apt-file search libnvidia-tls.so.390.157 libnvidia-gl-390: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.390.157 libnvidia-gl-390: /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.390.157 ``` Having it started testing out the external multi-monitor setup it didn't see external monitors straight away, so i undocked -> docked to trigger display detection and that worked, but also triggered a kernel crash and frozen screen/keyboard. captured crash info: ``` 40781.413803] general protection fault, probably for non-canonical address 0x93f96c1db8514a60: [#1] PREEMPT SMP NOPTI [40781.413811] CPU: 6 PID: 1554 Comm: nvidia-modeset Tainted: P O 6.2.0-24-generic #24-Ubuntu [40781.413814] Hardware name: LENOVO 2349G7G/2349G7G, BIOS G1ETC2WW (2.82 ) 08/07/2019 [40781.413815] RIP: 0010:_raw_spin_lock+0x13/0x60 [40781.413821] Code: 31 db c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 2c 9e 35 52 31 c0 ba 01 00 00 00 0f b1 17 75 1b 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 [40781.413823] RSP: 0018:b766c0aabe78 EFLAGS: 00010246 [40781.413825] RAX: RBX: 9ce4d9931b80 RCX: [40781.413827] RDX: 0001 RSI: RDI: 93f96c1db8514a60 [40781.413828] RBP: b766c0aabeb0 R08: R09: [40781.413829] R10: R11: R12: 93f96c1db8514a38 [40781.413831] R13: c26009f8 R14: 9ce4e6441940 R15: 93f96c1db8514a38 [40781.413832] FS: () GS:9ce7ee38() knlGS: [40781.413834] CS: 0010 DS: ES: CR0: 80050033 [40781.413835] CR2: 7fe70927aff8 CR3: 000389410001 CR4: 001706e0 [40781.413837] Call Trace: [40781.413839] [40781.413842] ? nv_drm_gem_prime_fence_event+0x29/0x110 [nvidia_drm] [40781.413852] nvkms_kthread_q_callback+0x7d/0xe0 [nvidia_modeset] [40781.413873] _main_loop+0x7f/0x140 [nvidia] [40781.414172] ? __pfx__main_loop+0x10/0x10 [nvidia] [40781.414580] kthread+0xe6/0x110 [40781.414589] ? __pfx_kthread+0x10/0x10 [40781.414595] ret_from_fork+0x29/0x50 [40781.414602] [40781.414604] Modules linked in: snd_seq_dummy snd_hrtimer nf_conntrack_netlink xfrm_user xfrm_algo xt_CHECKSUM ccm algif_aead des_generic libdes md4 wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nft_masq rfcomm cmac algif_hash algif_skcipher af_alg overlay bnep lz4 lz4_compress zram ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_multiport nft_limit xt_limit xt_addrtype xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat binfmt_misc nf_tables nfnetlink nls_iso8859_1 nvidia_uvm(PO) intel_rapl_msr snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi intel_rapl_common uvcvideo x86_pkg_temp_thermal btusb videobuf2_vmalloc snd_hda_intel videobuf2_memops btrtl videobuf2_v4l2 snd_intel_dspcfg btbcm intel_powerclamp videodev snd_intel_sdw_acpi snd_hda_codec btintel [40781.414704] kvm_intel videobuf2_common btmtk snd_hda_core mc snd_hwdep bluetooth kvm nvidia_drm(PO) snd_pcm ecdh_generic ecc thinkpad_acpi irqbypass nvram nvidia_modeset(PO) iwlmvm snd_seq_midi snd_seq_midi_event rapl nvidia(PO) snd_rawmidi mac80211 libarc4 intel_cstate snd_seq iwlwifi snd_seq_device snd_timer ipmi_devintf cfg80211 ipmi_msghandler snd think_lmi ledtrig_audio platform_profile soundcore firmware_attributes_class wmi_bmof at24 joydev input_leds mac_hid serio_raw iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables pkcs8_key_parser cuse coretemp msr parport_pc ppdev lp parport bfq efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx