Re: Need help debugging total system lockup, probably notebook power saving related

2018-02-22 Thread Ondřej Grover
Hi,

so far the i915.enable_rc6=0 option seems to have worked. No system hangs
with it so far. Haven't got a feel if battery run-time is worse, but the
difference probably isn't that bad.

Ondrej G.

On Wed, Feb 21, 2018 at 8:09 PM, Ondřej Grover 
wrote:

> Hi Henning,
> thanks for the tip.
>
> However, I've been experiencing this issue already before the
> spectre/meltdown bunch hit the fan.
> It does sound like it a bit, but likely is some different HW stuff.
> For now I'm experimenting with i915.enable_rc6=0. I hope it won't bog down
> the battery run-time too much.
>
> Ondrej G.
>
> On Wed, Feb 21, 2018 at 3:55 PM, Henning Follmann <
> hfollm...@itcfollmann.com> wrote:
>
>> On Wed, Feb 21, 2018 at 09:51:58AM +0100, Ondřej Grover wrote:
>> > Hello,
>> >
>> > I need help debugging random total system lock-ups.
>> > This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with
>> > the 4.9.0-5-amd64 kernel.
>> >
>> > When running on battery (does not happen on AC power), usually after
>> > resuming from RAM, after some rather random time (can be a few minutes
>> to
>> > hours) the system suddenly locks up, the screen freezes, keyboard and
>> the
>> > click-pad don't react, sound keeps playing a ~2 second loop. The
>> computer
>> > does not react to magic SysRq combos (probably because the keyboard
>> doesn't
>> > react), or to pressing the power key. I cannot ping it nor ssh into it.
>> The
>> > notebook appears to stay in this state indefinitely (the screen does not
>> > blank). Only a ~10-sec power-key hold or removing the battery does a
>> hard
>> > reset.
>> >
>> > I believe this is a kernel-level lock-up in some hardware driver.
>> > Unfortunately, I haven't been able to find out which one, because the
>> log
>> > files (tried both syslog and journald) contain nothing out of the
>> ordinary
>> > just before the lock-up. Probably the IO locks-up as well.
>> >
>> > Netconsole isn't really an easy option, because I cannot reliably
>> reproduce
>> > this in a suitable controlled environment, which is further complicated
>> by
>> > the lack of polling support (required for netconsole) on the wireless
>> > interface.
>> >
>> > My suspects:
>> > - The integrated Intel graphics card with the i915 driver: always had
>> > issues with it (on linux-3.16 it used to crash/hang a lot), maybe the
>> gpu
>> > hangs are not properly detected anymore.
>> > - The hard disk sometimes loses APM levels after suspend (have to use
>> > pm_async == 0 to prevent errors after each suspend). Maybe this points
>> to a
>> > larger suspend/power-mgmt issue.
>> > - My iwlwifi interface sometimes crashes and only removing it from the
>> PCI
>> > bus and rescanning for it helps. But this procedure does not hang the
>> whole
>> > system.
>> >
>> > Any help, suggestions, pointers will be appreciated.
>> >
>>
>> Hello,
>> I do have some power management issues with 4.9.0-5-amd64. My issues seem
>> different than yours ( I cannot boot up with power plugged in) but it is
>> also a hard stop, no logs available. I really haven't figured out the
>> exact
>> cause however in my case they seem related to the spectre/meltdown fix in
>> the newest kernel.
>> You can switch that off by adding pti=off to
>> GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running update-grub.
>>
>> It is no ideal solution, but maybe it helps narrowing down the issue.
>>
>>
>> -H
>>
>>
>> --
>> Henning Follmann   | hfollm...@itcfollmann.com
>>
>>
>


Re: Need help debugging total system lockup, probably notebook power saving related

2018-02-21 Thread Ondřej Grover
Hi Henning,
thanks for the tip.

However, I've been experiencing this issue already before the
spectre/meltdown bunch hit the fan.
It does sound like it a bit, but likely is some different HW stuff.
For now I'm experimenting with i915.enable_rc6=0. I hope it won't bog down
the battery run-time too much.

Ondrej G.

On Wed, Feb 21, 2018 at 3:55 PM, Henning Follmann  wrote:

> On Wed, Feb 21, 2018 at 09:51:58AM +0100, Ondřej Grover wrote:
> > Hello,
> >
> > I need help debugging random total system lock-ups.
> > This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with
> > the 4.9.0-5-amd64 kernel.
> >
> > When running on battery (does not happen on AC power), usually after
> > resuming from RAM, after some rather random time (can be a few minutes to
> > hours) the system suddenly locks up, the screen freezes, keyboard and the
> > click-pad don't react, sound keeps playing a ~2 second loop. The computer
> > does not react to magic SysRq combos (probably because the keyboard
> doesn't
> > react), or to pressing the power key. I cannot ping it nor ssh into it.
> The
> > notebook appears to stay in this state indefinitely (the screen does not
> > blank). Only a ~10-sec power-key hold or removing the battery does a hard
> > reset.
> >
> > I believe this is a kernel-level lock-up in some hardware driver.
> > Unfortunately, I haven't been able to find out which one, because the log
> > files (tried both syslog and journald) contain nothing out of the
> ordinary
> > just before the lock-up. Probably the IO locks-up as well.
> >
> > Netconsole isn't really an easy option, because I cannot reliably
> reproduce
> > this in a suitable controlled environment, which is further complicated
> by
> > the lack of polling support (required for netconsole) on the wireless
> > interface.
> >
> > My suspects:
> > - The integrated Intel graphics card with the i915 driver: always had
> > issues with it (on linux-3.16 it used to crash/hang a lot), maybe the gpu
> > hangs are not properly detected anymore.
> > - The hard disk sometimes loses APM levels after suspend (have to use
> > pm_async == 0 to prevent errors after each suspend). Maybe this points
> to a
> > larger suspend/power-mgmt issue.
> > - My iwlwifi interface sometimes crashes and only removing it from the
> PCI
> > bus and rescanning for it helps. But this procedure does not hang the
> whole
> > system.
> >
> > Any help, suggestions, pointers will be appreciated.
> >
>
> Hello,
> I do have some power management issues with 4.9.0-5-amd64. My issues seem
> different than yours ( I cannot boot up with power plugged in) but it is
> also a hard stop, no logs available. I really haven't figured out the exact
> cause however in my case they seem related to the spectre/meltdown fix in
> the newest kernel.
> You can switch that off by adding pti=off to
> GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running update-grub.
>
> It is no ideal solution, but maybe it helps narrowing down the issue.
>
>
> -H
>
>
> --
> Henning Follmann   | hfollm...@itcfollmann.com
>
>


Re: Need help debugging total system lockup, probably notebook power saving related

2018-02-21 Thread Henning Follmann
On Wed, Feb 21, 2018 at 09:51:58AM +0100, Ondřej Grover wrote:
> Hello,
> 
> I need help debugging random total system lock-ups.
> This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with
> the 4.9.0-5-amd64 kernel.
> 
> When running on battery (does not happen on AC power), usually after
> resuming from RAM, after some rather random time (can be a few minutes to
> hours) the system suddenly locks up, the screen freezes, keyboard and the
> click-pad don't react, sound keeps playing a ~2 second loop. The computer
> does not react to magic SysRq combos (probably because the keyboard doesn't
> react), or to pressing the power key. I cannot ping it nor ssh into it. The
> notebook appears to stay in this state indefinitely (the screen does not
> blank). Only a ~10-sec power-key hold or removing the battery does a hard
> reset.
> 
> I believe this is a kernel-level lock-up in some hardware driver.
> Unfortunately, I haven't been able to find out which one, because the log
> files (tried both syslog and journald) contain nothing out of the ordinary
> just before the lock-up. Probably the IO locks-up as well.
> 
> Netconsole isn't really an easy option, because I cannot reliably reproduce
> this in a suitable controlled environment, which is further complicated by
> the lack of polling support (required for netconsole) on the wireless
> interface.
> 
> My suspects:
> - The integrated Intel graphics card with the i915 driver: always had
> issues with it (on linux-3.16 it used to crash/hang a lot), maybe the gpu
> hangs are not properly detected anymore.
> - The hard disk sometimes loses APM levels after suspend (have to use
> pm_async == 0 to prevent errors after each suspend). Maybe this points to a
> larger suspend/power-mgmt issue.
> - My iwlwifi interface sometimes crashes and only removing it from the PCI
> bus and rescanning for it helps. But this procedure does not hang the whole
> system.
> 
> Any help, suggestions, pointers will be appreciated.
> 

Hello,
I do have some power management issues with 4.9.0-5-amd64. My issues seem
different than yours ( I cannot boot up with power plugged in) but it is
also a hard stop, no logs available. I really haven't figured out the exact
cause however in my case they seem related to the spectre/meltdown fix in
the newest kernel.
You can switch that off by adding pti=off to 
GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running update-grub.

It is no ideal solution, but maybe it helps narrowing down the issue.


-H


-- 
Henning Follmann   | hfollm...@itcfollmann.com