Public bug reported:

I have had my system for a few months now, and just within the last
several days I am getting random crashes, and the exact behavior of the
lockup has been different for different crashes.  For instance, one of
the crashes occurred before the screen blanked out, and I couldn't use
the mouse, keyboard, couldn't drop into a TTY prompt, couldn't REISUB to
reset, etc.  No action when I pressed the caps lock, etc.  Another
occurred while the screen was blanked out and wouldn't let me back in.
Other crashes have allowed me to move my mouse, but not use anything in
the GUI, and lets me REISUB (bot not drop into a prompt).

I run some Python deep learning on the 2080 Ti, but some of the lockups
occurred outside of that training.  And the ones that did occur during
training occurred while using less than 50% of GPU RAM, maintaining
temperatures at or below 65C for the GPU.

I have an AMD threadripper 2950x in an AORUS XTREME x399, 64GB RAM
(RipJaws), NVIDIA RTX 2080 Ti, 1300w EVGA SuperNova G2 power supply, 1TB
M.2 NVMe SSD.

I saw in another thread that CPU temperature could possibly be an issue,
so I tried to install the lm-sensors hdd but it didn't detect any
sensors.  I might investigate that further when I get more time, but I
was able to pull up the file containing the cpu temperature and it
showed 39000.

Since I really don't know where to start, I am hoping for any advice on
a way forward, or maybe that there is something obvious that you guys
might see in looking at the bug report info.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-5.0.0-29-generic 5.0.0-29.31~18.04.1
ProcVersionSignature: Ubuntu 5.0.0-29.31~18.04.1-generic 5.0.21
Uname: Linux 5.0.0-29-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Mon Sep 30 17:24:52 2019
InstallationDate: Installed on 2019-08-26 (34 days ago)
InstallationMedia: Ubuntu 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805)
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: linux-signed-hwe (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug bionic

** Description changed:

  I have had my system for a few months now, and just within the last
  several days I am getting random crashes, and the exact behavior of the
  lockup has been different for different crashes.  For instance, one of
  the crashes occurred before the screen blanked out, and I couldn't use
  the mouse, keyboard, couldn't drop into a TTY prompt, couldn't REISUB to
  reset, etc.  No action when I pressed the caps lock, etc.  Another
  occurred while the screen was blanked out and wouldn't let me back in.
  Other crashes have allowed me to move my mouse, but not use anything in
  the GUI, and lets me REISUB (bot not drop into a prompt).
  
  I run some Python deep learning on the 2080 Ti, but some of the lockups
  occurred outside of that training.  And the ones that did occur during
  training occurred while using less than 50% of GPU RAM, maintaining
  temperatures at or below 65C for the GPU.
  
  I have an AMD threadripper 2950x in an AORUS XTREME x399, 64GB RAM
  (RipJaws), NVIDIA RTX 2080 Ti, 1300w EVGA SuperNova G2 power supply, 1TB
  M.2 NVMe SSD.
  
  I saw in another thread that CPU temperature could possibly be an issue,
  so I tried to install the lm-sensors hdd but it didn't detect any
  sensors.  I might investigate that further when I get more time, but I
  was able to pull up the file containing the cpu temperature and it
  showed 39000.
  
- Since I really don't know where to start, I am hoping for any
+ Since I really don't know where to start, I am hoping for any advice on
+ a way forward, or maybe that there is something obvious that you guys
+ might see in looking at the bug report info.
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-5.0.0-29-generic 5.0.0-29.31~18.04.1
  ProcVersionSignature: Ubuntu 5.0.0-29.31~18.04.1-generic 5.0.21
  Uname: Linux 5.0.0-29-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.9-0ubuntu7.7
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Sep 30 17:24:52 2019
  InstallationDate: Installed on 2019-08-26 (34 days ago)
  InstallationMedia: Ubuntu 18.04.3 LTS "Bionic Beaver" - Release amd64 
(20190805)
  SourcePackage: linux-signed-hwe
  UpgradeStatus: No upgrade log present (probably fresh install)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1846129

Title:
  Ubuntu 18.04.3 LTS Random unrecoverable crashes

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1846129/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to