Based on stack trace:
[ 1692.658756] Call Trace:
[ 1692.658762] [c00020739ba9b970] [0000000024008842] 0x24008842 (unreliable)
[ 1692.658769] [c00020739ba9bb48] [c00000000001c270] __switch_to+0x2a0/0x4d0
[ 1692.658774] [c00020739ba9bba8] [c000000000d048a4] __schedule+0x2a4/0xb00
[ 1692.658777] [c00020739ba9bc78] [c000000000d05140] schedule+0x40/0xc0
[ 1692.658781] [c00020739ba9bc98] [c000000000537bf4]
jbd2_log_wait_commit+0xf4/0x1b0
[ 1692.658784] [c00020739ba9bd18] [c0000000004c5ee4] ext4_sync_file+0x354/0x620
[ 1692.658788] [c00020739ba9bd78] [c00000000042afb8] vfs_fsync_range+0x78/0x170
[ 1692.658790] [c00020739ba9bdc8] [c00000000042b138] do_fsync+0x58/0xd0
[ 1692.658792] [c00020739ba9be08] [c00000000042b528] SyS_fsync+0x28/0x40
[ 1692.658795] [c00020739ba9be28] [c00000000000b284] system_call+0x58/0x6c
[ 1692.658839] Kernel panic - not syncing: hung_task: blocked tasks
[ 1692.659238] CPU: 48 PID: 785 Comm: khungtaskd Not tainted
4.15.0-1017.19-bz175922-ibm-gt #bz175922
[ 1692.659835] Call Trace:
[ 1692.660025] [c000008fd0eefbf8] [c000000000cea13c] dump_stack+0xb0/0xf4
(unreliable)
[ 1692.660564] [c000008fd0eefc38] [c000000000110020] panic+0x148/0x328
[ 1692.661004] [c000008fd0eefcd8] [c000000000233a08] watchdog+0x2c8/0x420
[ 1692.661429] [c000008fd0eefdb8] [c000000000140068] kthread+0x1a8/0x1b0
[ 1692.661881] [c000008fd0eefe28] [c00000000000b654]
ret_from_kernel_thread+0x5c/0x88
[ 1692.662439] Sending IPI to other CPUs
[ 1693.971250] IPI complete
This IPI being sent to all other CPUs suggest that you preempted them by
a NMI, in order to stop execution and, likely, call panic() for a dump.
If that is true, that can be configured by sysctl variables:
kernel.hardlockup_panic = 0 -> THIS, for HARD lockups
kernel.hung_task_panic = 0 -> THIS, for SCHEDULING dead locks
kernel.panic = 0
kernel.panic_on_io_nmi = 0
kernel.panic_on_oops = 1
kernel.panic_on_rcu_stall = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_warn = 0
kernel.panic_print = 0
kernel.softlockup_panic = 0 -> THIS, for SOFT lockups
kernel.unknown_nmi_panic = 0
vm.panic_on_oom = 0 -> THIS for OOM issues
And the panic would not happen for live virsh dumps (the live dump is
likely causing delays in the VM and causing the pagecache to be fully
dirtied, so the I/Os can't be commit as fast as the pages are being
dirtied).
Checking the sosreport you sent:
$ cat sos_commands/kernel/sysctl_-a | grep -i panic
kernel.hardlockup_panic = 0
kernel.hung_task_panic = 1
kernel.panic = 1
kernel.panic_on_oops = 1
kernel.panic_on_rcu_stall = 0
kernel.panic_on_warn = 0
kernel.softlockup_panic = 1
vm.panic_on_oom = 0
You have kernel.softlockup_panic = 1, this is what is causing the panic
whenever the guest is having too much "steal time" to catch up with its
needs (causing the lockups to happen).
Am I missing something ?
** Changed in: libvirt (Ubuntu)
Status: New => Triaged
** Changed in: libvirt (Ubuntu)
Importance: Undecided => Low
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1846237
Title:
Kernel Panic while virsh dump of Guest with 300G RAM is triggered.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1846237/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs