Oh also see https://bugs.launchpad.net/ubuntu/+source/linux-
hwe/+bug/1713751 which has some superficially similar symptoms (cpu
stuck on shutdown).
** Description changed:
This is impacting us for ubuntu autopkgtests. Eventually the whole
region ends up dying because each worker is hit by this bug in turn and
backs off until the next reset (6 hourly).
- Guests are sometimes failing to reboot. When this happens, you see the
- following in the console
+ 17.10 (and bionic) guests are sometimes failing to reboot. When this
+ happens, you see the following in the console
- [[0;32m OK [0m] Reached target Shutdown.
- [ 191.698969] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]
- [ 219.698438] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]
- [ 226.702150] INFO: rcu_sched detected stalls on CPUs/tasks:
- [ 226.704958] »(detected by 0, t=15002 jiffies, g=5347, c=5346, q=187)
- [ 226.706093] All QSes seen, last rcu_sched kthread activity 15002
(4294949060-4294934058), jiffies_till_next_fqs=1, root ->qsmask 0x0
- [ 226.708202] rcu_sched kthread starved for 15002 jiffies! g5347 c5346
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
+ [[0;32m OK [0m] Reached target Shutdown.
+ [ 191.698969] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]
+ [ 219.698438] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]
+ [ 226.702150] INFO: rcu_sched detected stalls on CPUs/tasks:
+ [ 226.704958] »(detected by 0, t=15002 jiffies, g=5347, c=5346, q=187)
+ [ 226.706093] All QSes seen, last rcu_sched kthread activity 15002
(4294949060-4294934058), jiffies_till_next_fqs=1, root ->qsmask 0x0
+ [ 226.708202] rcu_sched kthread starved for 15002 jiffies! g5347 c5346
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
One host that exhibits this behaviour was:
- Linux klock 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC
+ Linux klock 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux
guest running:
- Linux version 4.13.0-16-generic (buildd@lcy01-02) (gcc version 7.2.0
+ Linux version 4.13.0-16-generic (buildd@lcy01-02) (gcc version 7.2.0
(Ubuntu 7.2.0-8ubuntu2)) #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC 2017
(Ubuntu 4.13.0-16.19-generic 4.13.4)
The affected cloud region is running the xenial/Ocata cloud archive, so
the version of qemu-kvm in there may also be relevant.
Here's how I reproduced it in lcy01:
- $ for n in {1..30}; do nova boot --flavor m1.small --image
ubuntu/ubuntu-artful-17.10-amd64-server-20171026.1-disk1.img --key-name
testbed-`hostname` --nic net-name=net_ues_proposed_migration laney-test${n};
done
- $ <ssh to each instance> sudo reboot
- # wait a minute or so for the instances to all reboot
- $ for n in {1..30}; do echo "=== ${n} ==="; nova console-log laney-test${n}
| tail; done
+ $ for n in {1..30}; do nova boot --flavor m1.small --image
ubuntu/ubuntu-artful-17.10-amd64-server-20171026.1-disk1.img --key-name
testbed-`hostname` --nic net-name=net_ues_proposed_migration laney-test${n};
done
+ $ <ssh to each instance> sudo reboot
+ # wait a minute or so for the instances to all reboot
+ $ for n in {1..30}; do echo "=== ${n} ==="; nova console-log laney-test${n}
| tail; done
On bad instances you'll see the "soft lockup" message - on good it'll
reboot as normal.
We've seen good and bad instances on multiple compute hosts - it doesn't
feel to me like a host problem but rather a race condition somewhere
that's somehow either triggered or triggered much more often by what
lcy01 is running. I always saw this on the first reboot - never on first
boot, and never on n>1th boot. (But if it's a race then that might not
mean much.)
I'll attach a bad and a good console-log for reference.
If you're at Canonical then see internal rt #107135 for some other
details.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1730717
Title:
Some VMs fail to reboot with "watchdog: BUG: soft lockup - CPU#0 stuck
for 22s! [systemd:1]"
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730717/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs