Public bug reported:
Instead of normal complete shutdowns we're getting soft lockup failures.
This started when 16.04 hwe packages switched to the 4.10 kernel about a
month ago. I help manage a few hundred machines spanning several
different sites and several different hardware models and they're all
experiencing this intermittently, approximately 5% get stuck on shutdown
each day.
Here is an example of what is on the screen after it happens, the
machine is unresponsive and requires a hard reset. I can't see anything
in syslog or dmesg that differs when this happens, I think all logging
has stopped at this point in the shutdown.
[54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288)
[54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54746.232003] INFO: rcu_sched self-detected stall on CPU
[54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0
softirq=218389/218389 fqs=3247712
This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22:
... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s!
Reverting to 4.8.0-58 avoids the problem. I believe the problem has been
present with every hwe 4.10 kernel package through the current
linux-image-4.10.0-33-generic. This bug was filed with data right after it
occurred with linux-image-4.10.0-33-generic.
This only happens approximately 5% of the time with no discernible
pattern. I am able to reproduce the issue on one particular machine by
scheduling shutdowns 3 times per day and waiting up to a few days for
the problem to occur. Shutting down and starting up more frequently,
like every 5 minutes or even an hour, will not trigger the problem, it
seems like the machine needs to be running for a while. It does not
seem to depend on any user actions, it happens even if you never login.
It has happened on reboots as as opposed to shutdowns as well. I found
a few similar bug reports but nothing for these exact symptoms.
I have tried blacklisting mei_me with no change in behavior. I'm not
sure but the majority of the affected machines are using intel video
chips. Next I am going to try a mainline 4.10 kernel.
lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04
apt-cache policy linux-image-4.10.0-33-generic
linux-image-4.10.0-33-generic:
Installed: 4.10.0-33.37~16.04.1
Candidate: 4.10.0-33.37~16.04.1
Version table:
*** 4.10.0-33.37~16.04.1 500
500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64
Packages
100 /var/lib/dpkg/status
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1
ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-33-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
CurrentDesktop: XFCE
Date: Tue Aug 29 08:57:26 2017
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
** Affects: linux-hwe (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug third-party-packages xenial
** Attachment added: "soft-lockup-3.png"
https://bugs.launchpad.net/bugs/1713751/+attachment/4940859/+files/soft-lockup-3.png
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1713751
Title:
soft lockup / stall on CPU when shutting down with hwe 4.10 kernel
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1713751/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs