Re: Need help debugging total system lockup, probably notebook power saving related
Hi, so far the i915.enable_rc6=0 option seems to have worked. No system hangs with it so far. Haven't got a feel if battery run-time is worse, but the difference probably isn't that bad. Ondrej G. On Wed, Feb 21, 2018 at 8:09 PM, Ondřej Groverwrote: > Hi Henning, > thanks for the tip. > > However, I've been experiencing this issue already before the > spectre/meltdown bunch hit the fan. > It does sound like it a bit, but likely is some different HW stuff. > For now I'm experimenting with i915.enable_rc6=0. I hope it won't bog down > the battery run-time too much. > > Ondrej G. > > On Wed, Feb 21, 2018 at 3:55 PM, Henning Follmann < > hfollm...@itcfollmann.com> wrote: > >> On Wed, Feb 21, 2018 at 09:51:58AM +0100, Ondřej Grover wrote: >> > Hello, >> > >> > I need help debugging random total system lock-ups. >> > This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with >> > the 4.9.0-5-amd64 kernel. >> > >> > When running on battery (does not happen on AC power), usually after >> > resuming from RAM, after some rather random time (can be a few minutes >> to >> > hours) the system suddenly locks up, the screen freezes, keyboard and >> the >> > click-pad don't react, sound keeps playing a ~2 second loop. The >> computer >> > does not react to magic SysRq combos (probably because the keyboard >> doesn't >> > react), or to pressing the power key. I cannot ping it nor ssh into it. >> The >> > notebook appears to stay in this state indefinitely (the screen does not >> > blank). Only a ~10-sec power-key hold or removing the battery does a >> hard >> > reset. >> > >> > I believe this is a kernel-level lock-up in some hardware driver. >> > Unfortunately, I haven't been able to find out which one, because the >> log >> > files (tried both syslog and journald) contain nothing out of the >> ordinary >> > just before the lock-up. Probably the IO locks-up as well. >> > >> > Netconsole isn't really an easy option, because I cannot reliably >> reproduce >> > this in a suitable controlled environment, which is further complicated >> by >> > the lack of polling support (required for netconsole) on the wireless >> > interface. >> > >> > My suspects: >> > - The integrated Intel graphics card with the i915 driver: always had >> > issues with it (on linux-3.16 it used to crash/hang a lot), maybe the >> gpu >> > hangs are not properly detected anymore. >> > - The hard disk sometimes loses APM levels after suspend (have to use >> > pm_async == 0 to prevent errors after each suspend). Maybe this points >> to a >> > larger suspend/power-mgmt issue. >> > - My iwlwifi interface sometimes crashes and only removing it from the >> PCI >> > bus and rescanning for it helps. But this procedure does not hang the >> whole >> > system. >> > >> > Any help, suggestions, pointers will be appreciated. >> > >> >> Hello, >> I do have some power management issues with 4.9.0-5-amd64. My issues seem >> different than yours ( I cannot boot up with power plugged in) but it is >> also a hard stop, no logs available. I really haven't figured out the >> exact >> cause however in my case they seem related to the spectre/meltdown fix in >> the newest kernel. >> You can switch that off by adding pti=off to >> GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running update-grub. >> >> It is no ideal solution, but maybe it helps narrowing down the issue. >> >> >> -H >> >> >> -- >> Henning Follmann | hfollm...@itcfollmann.com >> >> >
Re: Need help debugging total system lockup, probably notebook power saving related
Hi Henning, thanks for the tip. However, I've been experiencing this issue already before the spectre/meltdown bunch hit the fan. It does sound like it a bit, but likely is some different HW stuff. For now I'm experimenting with i915.enable_rc6=0. I hope it won't bog down the battery run-time too much. Ondrej G. On Wed, Feb 21, 2018 at 3:55 PM, Henning Follmannwrote: > On Wed, Feb 21, 2018 at 09:51:58AM +0100, Ondřej Grover wrote: > > Hello, > > > > I need help debugging random total system lock-ups. > > This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with > > the 4.9.0-5-amd64 kernel. > > > > When running on battery (does not happen on AC power), usually after > > resuming from RAM, after some rather random time (can be a few minutes to > > hours) the system suddenly locks up, the screen freezes, keyboard and the > > click-pad don't react, sound keeps playing a ~2 second loop. The computer > > does not react to magic SysRq combos (probably because the keyboard > doesn't > > react), or to pressing the power key. I cannot ping it nor ssh into it. > The > > notebook appears to stay in this state indefinitely (the screen does not > > blank). Only a ~10-sec power-key hold or removing the battery does a hard > > reset. > > > > I believe this is a kernel-level lock-up in some hardware driver. > > Unfortunately, I haven't been able to find out which one, because the log > > files (tried both syslog and journald) contain nothing out of the > ordinary > > just before the lock-up. Probably the IO locks-up as well. > > > > Netconsole isn't really an easy option, because I cannot reliably > reproduce > > this in a suitable controlled environment, which is further complicated > by > > the lack of polling support (required for netconsole) on the wireless > > interface. > > > > My suspects: > > - The integrated Intel graphics card with the i915 driver: always had > > issues with it (on linux-3.16 it used to crash/hang a lot), maybe the gpu > > hangs are not properly detected anymore. > > - The hard disk sometimes loses APM levels after suspend (have to use > > pm_async == 0 to prevent errors after each suspend). Maybe this points > to a > > larger suspend/power-mgmt issue. > > - My iwlwifi interface sometimes crashes and only removing it from the > PCI > > bus and rescanning for it helps. But this procedure does not hang the > whole > > system. > > > > Any help, suggestions, pointers will be appreciated. > > > > Hello, > I do have some power management issues with 4.9.0-5-amd64. My issues seem > different than yours ( I cannot boot up with power plugged in) but it is > also a hard stop, no logs available. I really haven't figured out the exact > cause however in my case they seem related to the spectre/meltdown fix in > the newest kernel. > You can switch that off by adding pti=off to > GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running update-grub. > > It is no ideal solution, but maybe it helps narrowing down the issue. > > > -H > > > -- > Henning Follmann | hfollm...@itcfollmann.com > >
Re: Need help debugging total system lockup, probably notebook power saving related
On Wed, Feb 21, 2018 at 09:51:58AM +0100, Ondřej Grover wrote: > Hello, > > I need help debugging random total system lock-ups. > This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with > the 4.9.0-5-amd64 kernel. > > When running on battery (does not happen on AC power), usually after > resuming from RAM, after some rather random time (can be a few minutes to > hours) the system suddenly locks up, the screen freezes, keyboard and the > click-pad don't react, sound keeps playing a ~2 second loop. The computer > does not react to magic SysRq combos (probably because the keyboard doesn't > react), or to pressing the power key. I cannot ping it nor ssh into it. The > notebook appears to stay in this state indefinitely (the screen does not > blank). Only a ~10-sec power-key hold or removing the battery does a hard > reset. > > I believe this is a kernel-level lock-up in some hardware driver. > Unfortunately, I haven't been able to find out which one, because the log > files (tried both syslog and journald) contain nothing out of the ordinary > just before the lock-up. Probably the IO locks-up as well. > > Netconsole isn't really an easy option, because I cannot reliably reproduce > this in a suitable controlled environment, which is further complicated by > the lack of polling support (required for netconsole) on the wireless > interface. > > My suspects: > - The integrated Intel graphics card with the i915 driver: always had > issues with it (on linux-3.16 it used to crash/hang a lot), maybe the gpu > hangs are not properly detected anymore. > - The hard disk sometimes loses APM levels after suspend (have to use > pm_async == 0 to prevent errors after each suspend). Maybe this points to a > larger suspend/power-mgmt issue. > - My iwlwifi interface sometimes crashes and only removing it from the PCI > bus and rescanning for it helps. But this procedure does not hang the whole > system. > > Any help, suggestions, pointers will be appreciated. > Hello, I do have some power management issues with 4.9.0-5-amd64. My issues seem different than yours ( I cannot boot up with power plugged in) but it is also a hard stop, no logs available. I really haven't figured out the exact cause however in my case they seem related to the spectre/meltdown fix in the newest kernel. You can switch that off by adding pti=off to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running update-grub. It is no ideal solution, but maybe it helps narrowing down the issue. -H -- Henning Follmann | hfollm...@itcfollmann.com