Public bug reported:

Instead of normal complete shutdowns we're getting soft lockup failures.
This started when 16.04 hwe packages switched to the 4.10 kernel about a
month ago. I help manage a few hundred machines spanning several
different sites and several different hardware models and they're all
experiencing this intermittently, approximately 5% get stuck on shutdown
each day.

Here is an example of what is on the screen after it happens, the
machine is unresponsive and requires a hard reset.  I can't see anything
in syslog or dmesg that differs when this happens, I think all logging
has stopped at this point in the shutdown.

[54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288)
[54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1)
[54746.232003] INFO: rcu_sched self-detected stall on CPU
[54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0 
softirq=218389/218389 fqs=3247712

This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22: 
... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 


Reverting to 4.8.0-58 avoids the problem. I believe the problem has been 
present with every hwe 4.10 kernel package through the current 
linux-image-4.10.0-33-generic.  This bug was filed with data right after it 
occurred with linux-image-4.10.0-33-generic.

This only happens approximately 5% of the time with no discernible
pattern.  I am able to reproduce the issue on one particular machine by
scheduling shutdowns 3 times per day and waiting up to a few days for
the problem to occur. Shutting down and starting up more frequently,
like every 5 minutes or even an hour, will not trigger the problem, it
seems like the machine needs to be running for a while.  It does not
seem to depend on any user actions, it happens even if you never login.
It has happened on reboots as as opposed to shutdowns as well.   I found
a few similar bug reports but nothing for these exact symptoms.

I have tried blacklisting mei_me with no change in behavior.  I'm not
sure but the majority of the affected machines are using intel video
chips.  Next I am going to try a mainline 4.10 kernel.


lsb_release -rd
Description:    Ubuntu 16.04.3 LTS
Release:        16.04


apt-cache policy linux-image-4.10.0-33-generic
linux-image-4.10.0-33-generic:
  Installed: 4.10.0-33.37~16.04.1
  Candidate: 4.10.0-33.37~16.04.1
  Version table:
 *** 4.10.0-33.37~16.04.1 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64 
Packages
        100 /var/lib/dpkg/status

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1
ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-33-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
CurrentDesktop: XFCE
Date: Tue Aug 29 08:57:26 2017
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: linux-hwe (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug third-party-packages xenial

** Attachment added: "soft-lockup-3.png"
   
https://bugs.launchpad.net/bugs/1713751/+attachment/4940859/+files/soft-lockup-3.png

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1713751

Title:
  soft lockup / stall on CPU when shutting down with hwe 4.10 kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1713751/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to