[Ubuntu-x-swat] [Bug 2028199] Re: GPU disconnect, then minutes later Xorg display lockup

2023-07-19 Thread Daniel van Vugt
It sounds like you've hit the problem Nvidia mentions in their
documentation:

http://us.download.nvidia.com/XFree86/Linux-x86_64/535.54.03/README/egpu.html

so I guess falling off the bus is just as bad as being unplugged.

Can you provide logs mentioning falling off the bus? I only recall a
different user mentioning it recently in bug 2023585.

** Package changed: xorg (Ubuntu) => nvidia-graphics-drivers-535
(Ubuntu)

** Changed in: nvidia-graphics-drivers-535 (Ubuntu)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg in Ubuntu.
https://bugs.launchpad.net/bugs/2028199

Title:
  GPU disconnect, then minutes later Xorg display lockup

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2028199/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-x-swat] [Bug 2028193] Re: Machine doesn't resume after sleep

2023-07-19 Thread Daniel van Vugt
** Tags added: amdgpu resume suspend-resume

** Package changed: xorg (Ubuntu) => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg in Ubuntu.
https://bugs.launchpad.net/bugs/2028193

Title:
  Machine doesn't resume after sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2028193/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-x-swat] [Bug 2028193] Re: Machine doesn't resume after sleep

2023-07-19 Thread Ubuntu Foundations Team Bug Bot
** Package changed: ubuntu => xorg (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg in Ubuntu.
https://bugs.launchpad.net/bugs/2028193

Title:
  Machine doesn't resume after sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/2028193/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-x-swat] [Bug 2028199] [NEW] GPU disconnect, then minutes later Xorg display lockup

2023-07-19 Thread beadon
Public bug reported:

It appears that occassionally my external GPU 'falls off' the bus, it's
listed as not primary, so should not be an issue.

However, a few minutes (usually) after this occurs, the Xorg server or
window manager completely locks up in strange ways.  For example - the
mouse can move across the multiple monitors, but no clicks are
registered.  similarly, no keyboard keypresses result in any change in
the UI.

However, applications happily run in the background - videoconference
meetings using the audio mic and speakers continue to operate , and the
machine stays online.  This leads me to believe there is some problem
with Xorg capturing the Human interface devices and passing these to the
correct applications, and then updating the displays.

I don't know how to trigger this behavior reliably, but I am getting
closer to tracking down how it occurs.  I hoped you might be able to
shed some light as to why this might be happening and how to resolve it.

ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: xorg 1:7.7+23ubuntu2
ProcVersionSignature: Ubuntu 6.2.0-25.25-generic 6.2.13
Uname: Linux 6.2.0-25-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: path was not a regular file.
.proc.driver.nvidia.capabilities.mig: Error: path was not a regular file.
.proc.driver.nvidia.gpus..52.00.0: Error: path was not a regular file.
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.54.03  Tue Jun  6 22:20:39 
UTC 2023
 GCC version:  gcc version 12.2.0 (Ubuntu 12.2.0-17ubuntu1)
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: pass
CompizPlugins: No value set for 
`/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Wed Jul 19 14:33:01 2023
DistUpgraded: 2023-04-26 15:38:23,675 DEBUG Running PostInstallScript: 
'/usr/lib/ubuntu-advantage/upgrade_lts_contract.py'
DistroCodename: lunar
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GraphicsCard:
 Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] [8086:9a49] (rev 01) 
(prog-if 00 [VGA controller])
   Subsystem: Lenovo TigerLake-LP GT2 [Iris Xe Graphics] [17aa:22d4]
 NVIDIA Corporation TU117 [GeForce GTX 1650] [10de:1f82] (rev a1) (prog-if 00 
[VGA controller])
   Subsystem: eVga.com. Corp. TU117 [GeForce GTX 1650] [3842:1257]
InstallationDate: Installed on 2023-01-09 (191 days ago)
InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 
(20220809.1)
MachineType: LENOVO 20XY0027US
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-25-generic 
root=UUID=1cae8af8-977f-4853-9106-9169f34c4bc2 ro quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: Upgraded to lunar on 2023-04-26 (84 days ago)
dmi.bios.date: 06/12/2023
dmi.bios.release: 1.61
dmi.bios.vendor: LENOVO
dmi.bios.version: N32ET85W (1.61 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20XY0027US
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 31
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.34
dmi.modalias: 
dmi:bvnLENOVO:bvrN32ET85W(1.61):bd06/12/2023:br1.61:efr1.34:svnLENOVO:pn20XY0027US:pvrThinkPadX1YogaGen6:rvnLENOVO:rn20XY0027US:rvrSDK0J40697WIN:cvnLENOVO:ct31:cvrNone:skuLENOVO_MT_20XY_BU_Think_FM_ThinkPadX1YogaGen6:
dmi.product.family: ThinkPad X1 Yoga Gen 6
dmi.product.name: 20XY0027US
dmi.product.sku: LENOVO_MT_20XY_BU_Think_FM_ThinkPad X1 Yoga Gen 6
dmi.product.version: ThinkPad X1 Yoga Gen 6
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.114-1
version.libgl1-mesa-dri: libgl1-mesa-dri 23.0.4-0ubuntu1~23.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx 23.0.4-0ubuntu1~23.04.1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.7-1ubuntu3
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-3
version.xserver-xorg-video-intel: xserver-xorg-video-intel 
2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

** Affects: xorg (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: amd64 apport-bug lunar ubuntu

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg in Ubuntu.
https://bugs.launchpad.net/bugs/2028199

Title:
  GPU disconnect, then minutes later Xorg display lockup

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/2028199/+subscriptions


___
Mailing list: 

[Ubuntu-x-swat] [Bug 2028165] Re: nvidia-dkms-* FTBS with linux 6.5

2023-07-19 Thread Ubuntu Foundations Team Bug Bot
** Tags added: patch

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to nvidia-graphics-drivers-390 in Ubuntu.
https://bugs.launchpad.net/bugs/2028165

Title:
  nvidia-dkms-* FTBS with linux 6.5

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2028165/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-x-swat] [Bug 2028165] Re: nvidia-dkms-* FTBS with linux 6.5

2023-07-19 Thread Paolo Pisati
** Patch added: "nvidia-graphics-drivers-390_390.157-0ubuntu8.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2028165/+attachment/5687148/+files/nvidia-graphics-drivers-390_390.157-0ubuntu8.debdiff

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to nvidia-graphics-drivers-390 in Ubuntu.
https://bugs.launchpad.net/bugs/2028165

Title:
  nvidia-dkms-* FTBS with linux 6.5

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2028165/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-x-swat] [Bug 2028165] [NEW] nvidia-dkms-* FTBS with linux 6.5

2023-07-19 Thread Paolo Pisati
Public bug reported:

[Impact]

...
In file included from 
/var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21,
 from 
/var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13:
/var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function 
‘NV_GET_USER_PAGES_REMOTE’:
/var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing 
argument 1 of ‘get_user_pages_remote’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
  164 |return get_user_pages_remote(tsk, mm, start, nr_pages, 
flags,
  | ^~~
  | |
  | struct task_struct *
...


[Fix]

Apply the attached fix.

[How to test]

Install (and build) the patched packet.

[Regression potential]

The fix is composed of two patches:

1) the first patch simply garbage collect a reference to a function that
was never used but that had the API changed in Linux 6.5 - so, it's a
trivial change.

2) the second patch actually reimplement part of the vma scanning that was 
removed in __get_user_pages_locked() in upstream commit 
b2cac248191b7466c5819e0da617b0705a26e197 "mm/gup: removed vmas
array from internal GUP functions" - here is where most likely any regression 
could be found.

** Affects: nvidia-graphics-drivers-390 (Ubuntu)
 Importance: Undecided
 Status: New

** Affects: nvidia-graphics-drivers-390 (Ubuntu Mantic)
 Importance: Undecided
 Status: New

** Description changed:

  [Impact]
+ 
+ In file included from 
/var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21,
+  from 
/var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13:
+ /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function 
‘NV_GET_USER_PAGES_REMOTE’:
+ /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing 
argument 1 of ‘get_user_pages_remote’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
+   164 |return get_user_pages_remote(tsk, mm, start, nr_pages, 
flags,
+   | ^~~
+   | |
+   | struct task_struct *
+ 
  
  [Fix]
  
+ Apply the attached fix.
+ 
  [How to test]
  
+ Install (and build) the patched packet.
+ 
  [Regression potential]
+ 
+ The fix is composed of two patches:

** Description changed:

  [Impact]
  
  In file included from 
/var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21,
-  from 
/var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13:
+  from 
/var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13:
  /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function 
‘NV_GET_USER_PAGES_REMOTE’:
  /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing 
argument 1 of ‘get_user_pages_remote’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
-   164 |return get_user_pages_remote(tsk, mm, start, nr_pages, 
flags,
-   | ^~~
-   | |
-   | struct task_struct *
- 
+   164 |return get_user_pages_remote(tsk, mm, start, nr_pages, 
flags,
+   | ^~~
+   | |
+   | struct task_struct *
  
  [Fix]
  
  Apply the attached fix.
  
  [How to test]
  
  Install (and build) the patched packet.
  
  [Regression potential]
  
  The fix is composed of two patches:
+ 
+ 1) the first patch simply garbage collect a reference to a function that
+ was never used but that had the API changed in Linux 6.5 - so, it's a
+ trivial change.
+ 
+ 2) the second patch actually reimplement part of the vma scanning that was 
removed in __get_user_pages_locked() in upstream commit 
b2cac248191b7466c5819e0da617b0705a26e197 "mm/gup: removed vmas
+ array from internal GUP functions" - here is where most likely any regression 
could be found.

** Also affects: nvidia-graphics-drivers-390 (Ubuntu Mantic)
   Importance: Undecided
   Status: New

** Description changed:

  [Impact]
  
+ ...
  In file included from 
/var/lib/dkms/nvidia/390.157/build/common/inc/nv-linux.h:21,
   from 
/var/lib/dkms/nvidia/390.157/build/nvidia/nv-instance.c:13:
  /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h: In function 
‘NV_GET_USER_PAGES_REMOTE’:
  /var/lib/dkms/nvidia/390.157/build/common/inc/nv-mm.h:164:45: error: passing 
argument 1 of ‘get_user_pages_remote’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
    164 |return get_user_pages_remote(tsk, mm, start, nr_pages, 
flags,
    |  

[Ubuntu-x-swat] [Bug 2016459] Re: NVRM: RmInitAdapter failed! , failed to copy vbios to system memory

2023-07-19 Thread Arjan
Hi, some progress.

running:
# uname -a
Linux T430-i7 6.2.0-24-generic #24-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 16 
12:03:50 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

to investigate why Xorg stopped working (hanging screen/keyboard)
In short, same libnvidia-tls issue and workarond as posted before.

solution:
```
/etc/ld.so.conf.d# cat nvidia.conf
# 2023/05/01 ArjanF https://bbs.archlinux.org/viewtopic.php?id=283327=2
/usr/lib/x86_64-linux-gnu/tls/
```
and perform ldconfig, reboot into 6.2.0-24 (as 6.2.0-25 has vbios copy error 
issue)

Then Xorg starts with nvidia support.

some details on the libnvidia-tls on my system
```
$ file $(find /usr/lib/x86_64-linux-gnu/  -name "libnvidia-tls*")
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.390.157: ELF 64-bit LSB shared 
object, x86-64, version 1 (SYSV), dynamically linked, for GNU/Linux 2.2.5, 
stripped
/usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.390.157: ELF 64-bit LSB shared 
object, x86-64, version 1 (SYSV), dynamically linked, for GNU/Linux 2.3.99, 
stripped
$ apt-file search libnvidia-tls.so.390.157
libnvidia-gl-390: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.390.157
libnvidia-gl-390: /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.390.157
```

Having it started testing out the external multi-monitor setup it didn't
see external monitors straight away, so i undocked -> docked to trigger
display detection and that worked, but also  triggered a kernel crash
and frozen screen/keyboard.

captured crash info:
```
40781.413803] general protection fault, probably for non-canonical address 
0x93f96c1db8514a60:  [#1] PREEMPT SMP NOPTI
[40781.413811] CPU: 6 PID: 1554 Comm: nvidia-modeset Tainted: P   O 
  6.2.0-24-generic #24-Ubuntu
[40781.413814] Hardware name: LENOVO 2349G7G/2349G7G, BIOS G1ETC2WW (2.82 ) 
08/07/2019
[40781.413815] RIP: 0010:_raw_spin_lock+0x13/0x60
[40781.413821] Code: 31 db c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 0f 1f 44 00 00 65 ff 05 2c 9e 35 52 31 c0 ba 01 00 00 00  0f b1 
17 75 1b 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9
[40781.413823] RSP: 0018:b766c0aabe78 EFLAGS: 00010246
[40781.413825] RAX:  RBX: 9ce4d9931b80 RCX: 
[40781.413827] RDX: 0001 RSI:  RDI: 93f96c1db8514a60
[40781.413828] RBP: b766c0aabeb0 R08:  R09: 
[40781.413829] R10:  R11:  R12: 93f96c1db8514a38
[40781.413831] R13: c26009f8 R14: 9ce4e6441940 R15: 93f96c1db8514a38
[40781.413832] FS:  () GS:9ce7ee38() 
knlGS:
[40781.413834] CS:  0010 DS:  ES:  CR0: 80050033
[40781.413835] CR2: 7fe70927aff8 CR3: 000389410001 CR4: 001706e0
[40781.413837] Call Trace:
[40781.413839]  
[40781.413842]  ? nv_drm_gem_prime_fence_event+0x29/0x110 [nvidia_drm]
[40781.413852]  nvkms_kthread_q_callback+0x7d/0xe0 [nvidia_modeset]
[40781.413873]  _main_loop+0x7f/0x140 [nvidia]
[40781.414172]  ? __pfx__main_loop+0x10/0x10 [nvidia]
[40781.414580]  kthread+0xe6/0x110
[40781.414589]  ? __pfx_kthread+0x10/0x10
[40781.414595]  ret_from_fork+0x29/0x50
[40781.414602]  
[40781.414604] Modules linked in: snd_seq_dummy snd_hrtimer 
nf_conntrack_netlink xfrm_user xfrm_algo xt_CHECKSUM ccm algif_aead des_generic 
libdes md4 wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 
poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel 
nft_masq rfcomm cmac algif_hash algif_skcipher af_alg overlay bnep lz4 
lz4_compress zram ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT 
nf_reject_ipv4 xt_LOG nf_log_syslog xt_multiport nft_limit xt_limit xt_addrtype 
xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat xt_comment xt_conntrack 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat binfmt_misc nf_tables 
nfnetlink nls_iso8859_1 nvidia_uvm(PO) intel_rapl_msr snd_ctl_led 
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi 
intel_rapl_common uvcvideo x86_pkg_temp_thermal btusb videobuf2_vmalloc 
snd_hda_intel videobuf2_memops btrtl videobuf2_v4l2 snd_intel_dspcfg btbcm 
intel_powerclamp videodev snd_intel_sdw_acpi snd_hda_codec btintel
[40781.414704]  kvm_intel videobuf2_common btmtk snd_hda_core mc snd_hwdep 
bluetooth kvm nvidia_drm(PO) snd_pcm ecdh_generic ecc thinkpad_acpi irqbypass 
nvram nvidia_modeset(PO) iwlmvm snd_seq_midi snd_seq_midi_event rapl nvidia(PO) 
snd_rawmidi mac80211 libarc4 intel_cstate snd_seq iwlwifi snd_seq_device 
snd_timer ipmi_devintf cfg80211 ipmi_msghandler snd think_lmi ledtrig_audio 
platform_profile soundcore firmware_attributes_class wmi_bmof at24 joydev 
input_leds mac_hid serio_raw iptable_filter ip6table_filter ip6_tables 
br_netfilter bridge stp llc arp_tables pkcs8_key_parser cuse coretemp msr 
parport_pc ppdev lp parport bfq efi_pstore dmi_sysfs ip_tables x_tables autofs4 
btrfs blake2b_generic dm_crypt raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx