Public bug reported: I have had my system for a few months now, and just within the last several days I am getting random crashes, and the exact behavior of the lockup has been different for different crashes. For instance, one of the crashes occurred before the screen blanked out, and I couldn't use the mouse, keyboard, couldn't drop into a TTY prompt, couldn't REISUB to reset, etc. No action when I pressed the caps lock, etc. Another occurred while the screen was blanked out and wouldn't let me back in. Other crashes have allowed me to move my mouse, but not use anything in the GUI, and lets me REISUB (bot not drop into a prompt).
I run some Python deep learning on the 2080 Ti, but some of the lockups occurred outside of that training. And the ones that did occur during training occurred while using less than 50% of GPU RAM, maintaining temperatures at or below 65C for the GPU. I have an AMD threadripper 2950x in an AORUS XTREME x399, 64GB RAM (RipJaws), NVIDIA RTX 2080 Ti, 1300w EVGA SuperNova G2 power supply, 1TB M.2 NVMe SSD. I saw in another thread that CPU temperature could possibly be an issue, so I tried to install the lm-sensors hdd but it didn't detect any sensors. I might investigate that further when I get more time, but I was able to pull up the file containing the cpu temperature and it showed 39000. Since I really don't know where to start, I am hoping for any advice on a way forward, or maybe that there is something obvious that you guys might see in looking at the bug report info. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-5.0.0-29-generic 5.0.0-29.31~18.04.1 ProcVersionSignature: Ubuntu 5.0.0-29.31~18.04.1-generic 5.0.21 Uname: Linux 5.0.0-29-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Mon Sep 30 17:24:52 2019 InstallationDate: Installed on 2019-08-26 (34 days ago) InstallationMedia: Ubuntu 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805) SourcePackage: linux-signed-hwe UpgradeStatus: No upgrade log present (probably fresh install) ** Affects: linux-signed-hwe (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug bionic ** Description changed: I have had my system for a few months now, and just within the last several days I am getting random crashes, and the exact behavior of the lockup has been different for different crashes. For instance, one of the crashes occurred before the screen blanked out, and I couldn't use the mouse, keyboard, couldn't drop into a TTY prompt, couldn't REISUB to reset, etc. No action when I pressed the caps lock, etc. Another occurred while the screen was blanked out and wouldn't let me back in. Other crashes have allowed me to move my mouse, but not use anything in the GUI, and lets me REISUB (bot not drop into a prompt). I run some Python deep learning on the 2080 Ti, but some of the lockups occurred outside of that training. And the ones that did occur during training occurred while using less than 50% of GPU RAM, maintaining temperatures at or below 65C for the GPU. I have an AMD threadripper 2950x in an AORUS XTREME x399, 64GB RAM (RipJaws), NVIDIA RTX 2080 Ti, 1300w EVGA SuperNova G2 power supply, 1TB M.2 NVMe SSD. I saw in another thread that CPU temperature could possibly be an issue, so I tried to install the lm-sensors hdd but it didn't detect any sensors. I might investigate that further when I get more time, but I was able to pull up the file containing the cpu temperature and it showed 39000. - Since I really don't know where to start, I am hoping for any + Since I really don't know where to start, I am hoping for any advice on + a way forward, or maybe that there is something obvious that you guys + might see in looking at the bug report info. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-5.0.0-29-generic 5.0.0-29.31~18.04.1 ProcVersionSignature: Ubuntu 5.0.0-29.31~18.04.1-generic 5.0.21 Uname: Linux 5.0.0-29-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Mon Sep 30 17:24:52 2019 InstallationDate: Installed on 2019-08-26 (34 days ago) InstallationMedia: Ubuntu 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805) SourcePackage: linux-signed-hwe UpgradeStatus: No upgrade log present (probably fresh install) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1846129 Title: Ubuntu 18.04.3 LTS Random unrecoverable crashes To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1846129/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs