Bug#920547: Crashes every few hours
Hi Ben, On Mon, Jan 28, 2019 at 12:42:37AM +, Ben Hutchings wrote: > On Sat, 26 Jan 2019 20:03:49 + Toni wrote: > > Package: src:linux > > Version: 4.19.16-1 > > Severity: critical > > File: linux-image-4.19.0-2-amd64 > > Is this a new problem with version 4.19.16-1? Or did it happen with > earlier versions as well? it happened with the 4.18.* kernel as well. The machine came with Ubuntu and 4.13 preinstalled, but I wiped it as soon as I could and installed Debian. So I don't know if it would have worked with Ubuntu - the entire setup was not suitable for my purposes, but I thought that 4.9 might be too old for this hardware. However, the machine came with a 1.3 BIOS, which I updated to 1.6 and then to 1.7. I think, I had 4.18 together with 1.6 running, but closed the corresponding bug report when I noticed that both a newer kernel and a newer BIOS were available. Well, the situation compared has improved a little, compared to that, but it is still very bad. > When you say "data loss", are you talking about data in memory or > corruption of files that were saved and sync'd to disk? I mean, files on disk were destroyed. I noticed some because I use etckeeper with git, and suddenly, I could no longer see my update history because files in /etc/.git were corrupt to the point that no "git fsck" or "git gc" could resurrect the tree. > On x86 laptops thermal management is (by default) done by the system > firmware (BIOS and management engine code). If you didn't override > that, and yet the CPU overheats, this is the manufacturer's fault. Ok... In the BIOS, I set the corresponding parameter from "performance" to "normal", which I hoped would be a more conservative setting, to prevent exactly this problem. Cheers, Toni
Bug#920547: Crashes every few hours
On Sun, 27 Jan 2019 15:10:39 -0500 Chris Manougian wrote: > Hi Toni. I have an XPS 15 9570, which, I think, is basically the same > machine, except yours uses an NVIDIA Quadro vs my GeForce GTX 1050Ti as a > 2nd graphics card. > > A lot of problems with that secondary graphics card and linux. Are you > attempting to use it via Bumblebee? > > See this thread (and links within the thread) - BIOS related: > https://bugzilla.redhat.com/show_bug.cgi?id=1610727 > > I did my best to disable the NVIDIA card: > https://wiki.archlinux.org/index.php/Dell_XPS_15_9570 > > One of my more recent "important" gnome-logs file is: > > 03:16:35 kernel: ath10k_pci :3b:00.0: firmware: failed to load > ath10k/cal-pci-:3b:00.0.bin (-2) > 03:16:35 kernel: firmware_class: See https://wiki.debian.org/Firmware for > information about missing firmware > 03:16:35 kernel: ath10k_pci :3b:00.0: firmware: failed to load > ath10k/pre-cal-pci-:3b:00.0.bin (-2) > 03:16:34 kernel: iTCO_wdt iTCO_wdt: can't request region for resource [mem > 0x00c5fffc-0x00c5] > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog > (20180531/psobject-221) > 03:16:34 kernel: ACPI BIOS Error (bug): Failure creating > [\_SB.PCI0.XHC.RHUB.SS10._PLD], AE_ALREADY_EXISTS (20180531/dswload2-316) > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog > (20180531/psobject-221) > 03:16:34 kernel: ACPI BIOS Error (bug): Failure creating > [\_SB.PCI0.XHC.RHUB.SS10._UPC], AE_ALREADY_EXISTS (20180531/dswload2-316) > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog > (20180531/psobject-221) > 03:16:34 kernel: ACPI BIOS Error (bug): Failure creating > [\_SB.PCI0.XHC.RHUB.SS09._PLD], AE_ALREADY_EXISTS (20180531/dswload2-316) > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog > (20180531/psobject-221) > 03:16:34 kernel: ACPI BIOS Error (bug): Failure creating > [\_SB.PCI0.XHC.RHUB.SS09._UPC], AE_ALREADY_EXISTS (20180531/dswload2-316) > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog > (20180531/psobject-221) > 03:16:34 kernel: ACPI BIOS Error (bug): Failure creating > [\_SB.PCI0.XHC.RHUB.SS08._PLD], AE_ALREADY_EXISTS (20180531/dswload2-316) > 03:16:34 kernel: ACPI Error: Skip parsing opcode OpcodeName unavailable > (20180531/psloop-542) > 03:16:34 kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog > (20180531/psobject-221) > 03:16:34 kernel: ACPI BIOS Error (bug): Failure creating Hi, If it may help (I'm on testing w/ 4.9.12) this bug seems to be NVidia-specific. Never encountered such things on my laptop which has an AMD GPU with the FOSS driver. Reference is Dell Latitude e6540. Rgds,
Bug#920547: Crashes every few hours
Control: tag -1 moreinfo On Sat, 26 Jan 2019 20:03:49 + Toni wrote: > Package: src:linux > Version: 4.19.16-1 > Severity: critical > File: linux-image-4.19.0-2-amd64 Is this a new problem with version 4.19.16-1? Or did it happen with earlier versions as well? > my laptop lasts a few hours at most until becoming unresponsive, hot, > and refuses to do normal things. Eg. trying to create this bug report > and using sudo to read the kernel logs after about one hour of total > uptime, with two suspend/resume cycles in between, made the system > crash. "Crash" means that, in such a situation, I can only press the > power button until the system is completely off, but after that, I am > forced to immediately turn the system back on, so that the fans can do > their work, because otherwise, the CPU overheats. Pressing > Ctrl-Alt-Delete has no effect. > > Justification for "grave": I've experienced data loss in such > situations, and of course, having the entire system going down, with > potential hardware damage (sans human intervention) is probably as bad > as it can be. When you say "data loss", are you talking about data in memory or corruption of files that were saved and sync'd to disk? On x86 laptops thermal management is (by default) done by the system firmware (BIOS and management engine code). If you didn't override that, and yet the CPU overheats, this is the manufacturer's fault. Ben. > I've attached the dmesg from boot and some kernel logs for your perusal, > cleansed from private data. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus signature.asc Description: This is a digitally signed message part