Public bug reported:

Hey folks,
I'm facing a completely random bug and I need guidance on how to locate the 
root cause and fix.
Since I'm a Linux-newbie, I did some investigation on my own with the help of 
AI, so here's what I have so far.

Problem/symptoms:
-All processes become completely frozen at random.
-When the issue happens, the CPU doesn't spike much (at all), but processes 
(like Google Chrome) become frozen and I get the "Force Quit or Wait" dialog.
-When I try to use Terminal to launch "htop" or "ps" to find the culprit PID, 
the command never finishes execution and the htop or ps process is stuck.
-When I use System Monitor and try to switch to the "Processes" tab, it 
completely freezes as well.
-"killall" command never worked to terminate chrome when this happened.
-When I had the PIDs (managed to print them before the issue happened), but 
"kill -9" has 0 effects.
-Killing any other process is also impossible via "killall" or "kill -9".

OS:
> lsb_release -rd
Description:    Ubuntu 24.04.3 LTS
Release:        24.04

Platform:
Laptop
MSI Titan 18 HX A14VIG
Intel Core i9-14900HX
128 GB RAM

> lspci | grep -E "VGA|3D|Display"
0000:00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S UHD 
Graphics (rev 04)
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GN21-X11 [GeForce 
RTX 4090 Laptop GPU] (rev a1)

Reproduced on the following Kernels:
-6.11.0-29-generic (reproduced for over a year).
-6.14.0-37-generic (installed a week ago, but reproduced 3x already).

Traces:
-AI suggested running "strace -f -o /tmp/ps.trace ps aux", which never returns, 
and when I run "tail -f /tmp/ps.trace" in another terminal, the output is cut 
at the culprit PID's read call (usually it's the chrome process, or sub process 
for the tab I was browsing), as follows:
3140158 openat(AT_FDCWD, "/proc/1953151/status", O_RDONLY) = 4
3140158 read(4, "Name:\tchrome\nUmask:\t0002\nState:\t"..., 2048) = 1590
3140158 close(4) = 0
3140158 openat(AT_FDCWD, "/proc/1953151/environ", O_RDONLY) = 4
3140158 read(4, "

-Sometimes, the culprit PID is for the "System Monitor".
-AI suggested to run the following, as it suspected to be related to a similar 
documented bug in the past, but none hung or showed any issues:
nsenter -t 1 -m -p -n true
cat /proc/self/stat
ls /proc | head
cat /proc/1/attr/current

Reproducibility:
-Random.
-Happens after a long time after reboot.
-Usually while browsing websites, watching videos, etc., on Google Chrome.
-Sometimes it happens if I leave VLC running, but paused playing a video.
-Almost a year ago, this used to happen when I had VMware Workstation running 
with GPU intensive operations within (a software that uses AI for object 
recognition) + all the above. This was the most frequent repro environment 
setting, but it also stopped when I switched from nvidia drivers to X.Org X 
server for the GPU.

Please, let me know how I can further diagnose this and/or have it fixed.
Thank you

** Affects: ubuntu
     Importance: Undecided
         Status: New

** Description changed:

  Hey folks,
  I'm facing a completely random bug and I need guidance on how to locate the 
root cause and fix.
  Since I'm a Linux-newbie, I did some investigation on my own with the help of 
AI, so here's what I have so far.
  
  Problem/symptoms:
  -All processes become completely frozen at random.
  -When the issue happens, the CPU doesn't spike much (at all), but processes 
(like Google Chrome) become frozen and I get the "Force Quit or Wait" dialog.
  -When I try to use Terminal to launch "htop" or "ps" to find the culprit PID, 
the command never finishes execution and the htop or ps process is stuck.
  -When I use System Monitor and try to switch to the "Processes" tab, it 
completely freezes as well.
  -"killall" command never worked to terminate chrome when this happened.
  -When I had the PIDs (managed to print them before the issue happened), but 
"kill -9" has 0 effects.
+ -Killing any other process is also impossible via "killall" or "kill -9".
  
  OS:
  > lsb_release -rd
  Description:  Ubuntu 24.04.3 LTS
  Release:      24.04
  
  Platform:
  Laptop
  MSI Titan 18 HX A14VIG
  Intel Core i9-14900HX
  128 GB RAM
  
  > lspci | grep -E "VGA|3D|Display"
  0000:00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S UHD 
Graphics (rev 04)
  0000:01:00.0 VGA compatible controller: NVIDIA Corporation GN21-X11 [GeForce 
RTX 4090 Laptop GPU] (rev a1)
  
- 
  Reproduced on the following Kernels:
  -6.11.0-29-generic (reproduced for over a year).
  -6.14.0-37-generic (installed a week ago, but reproduced 3x already).
- 
  
  Traces:
  -AI suggested running "strace -f -o /tmp/ps.trace ps aux", which never 
returns, and when I run "tail -f /tmp/ps.trace" in another terminal, the output 
is cut at the culprit PID's read call (usually it's the chrome process, or sub 
process for the tab I was browsing), as follows:
  3140158 openat(AT_FDCWD, "/proc/1953151/status", O_RDONLY) = 4
  3140158 read(4, "Name:\tchrome\nUmask:\t0002\nState:\t"..., 2048) = 1590
  3140158 close(4) = 0
  3140158 openat(AT_FDCWD, "/proc/1953151/environ", O_RDONLY) = 4
  3140158 read(4, "
  
  -Sometimes, the culprit PID is for the "System Monitor".
  -AI suggested to run the following, as it suspected to be related to a 
similar documented bug in the past, but none hung or showed any issues:
  nsenter -t 1 -m -p -n true
  cat /proc/self/stat
  ls /proc | head
  cat /proc/1/attr/current
  
- 
  Reproducibility:
  -Random.
  -Happens after a long time after reboot.
  -Usually while browsing websites, watching videos, etc., on Google Chrome.
  -Sometimes it happens if I leave VLC running, but paused playing a video.
  -Almost a year ago, this used to happen when I had VMware Workstation running 
with GPU intensive operations within (a software that uses AI for object 
recognition) + all the above. This was the most frequent repro environment 
setting, but it also stopped when I switched from nvidia drivers to X.Org X 
server for the GPU.
  
  Please, let me know how I can further diagnose this and/or have it fixed.
  Thank you

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2138780

Title:
  Randomly unable to kill any process or launch htop or ps

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2138780/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to