Public bug reported:
== Comment: #0 - HARIHARAN T. SUNDARESH REDDY - 2019-04-01 12:23:14 ==
---Problem Description---
Kernel panic occurred when `virsh dump` is triggered on the guest with 300G of
RAM.
---uname output---
Linux ltcgen6 4.15.0-1017.19-bz175922-ibm-gt #bz175922 SMP Thu Mar 21 09:34:09
CDT 2019 ppc64le ppc64le ppc64le GNU/Linux
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1. Define guest with 300G RAM, Start the guest
2. Run the following command
`virsh dump ubuntui_hsr /home/vm.core --memory-only --live'
3. Wait for some time, kernel panic will be see in the console.
--Log---
[ 1692.657251] INFO: task journal-offline:7763 blocked for more than 120
seconds.
[ 1692.657251] INFO: task journal-offline:7763 blocked for more than
120 seconds.
[ 1692.657754] Not tainted 4.15.0-1017.19-bz175922-ibm-gt
#bz175922
[ 1692.657754] Not tainted 4.15.0-1017.19-bz175922-ibm-gt
#bz175922
[ 1692.658220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1692.658220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1692.658839] Kernel panic - not syncing: hung_task: blocked tasks
[ 1692.658839] Kernel panic - not syncing: hung_task: blocked tasks
[ 1692.659238] CPU: 48 PID: 785 Comm: khungtaskd Not tainted
4.15.0-1017.19-bz175922-ibm-gt #bz175922
[ 1692.659238] CPU: 48 PID: 785 Comm: khungtaskd Not tainted
4.15.0-1017.19-bz175922-ibm-gt #bz175922
[ 1692.659835] Call Trace:
[ 1692.659835] Call Trace:
[ 1692.660025] [c000008fd0eefbf8] [c000000000cea13c]
dump_stack+0xb0/0xf4 (unreliable)
[ 1692.660025] [c000008fd0eefbf8] [c000000000cea13c]
dump_stack+0xb0/0xf4 (unreliable)
[ 1692.660564] [c000008fd0eefc38] [c000000000110020] panic+0x148/0x328
[ 1692.660564] [c000008fd0eefc38] [c000000000110020] panic+0x148/0x328
[ 1692.661004] [c000008fd0eefcd8] [c000000000233a08]
watchdog+0x2c8/0x420
[ 1692.661004] [c000008fd0eefcd8] [c000000000233a08]
watchdog+0x2c8/0x420
[ 1692.661429] [c000008fd0eefdb8] [c000000000140068] kthread+0x1a8/0x1b0
[ 1692.661429] [c000008fd0eefdb8] [c000000000140068] kthread+0x1a8/0x1b0
[ 1692.661881] [c000008fd0eefe28] [c00000000000b654]
ret_from_kernel_thread+0x5c/0x88
[ 1692.661881] [c000008fd0eefe28] [c00000000000b654]
ret_from_kernel_thread+0x5c/0x88
[ 1692.662439] Sending IPI to other CPUs
[ 1692.662439] Sending IPI to other CPUs
[ 1693.971250] IPI complete
[ 1693.971250] IPI complete
[ 1694.122536] kexec: Starting switchover sequence.
[ 1694.122536] kexec: Starting switchover sequence.
[ 1827.285354188,3] PHB#0003[0:3]: CRESET: Unexpected slot state
00000102, resetting...
[ 1831.426449832,3] PHB#0030[8:0]: CRESET: Unexpected slot state
00000102, resetting...
[ 1832.158844758,3] PHB#0033[8:3]: CRESET: Unexpected slot state
00000102, resetting...
[ 1832.403055326,3] PHB#0034[8:4]: CRESET: Unexpected slot state
00000102, resetting...
[ 1.924644] integrity: Unable to open file: /etc/keys/x509_ima.der
(-2)
[ 1.924644] integrity: Unable to open file: /etc/keys/x509_ima.der
(-2)
[ 1.924647] integrity: Unable to open file: /etc/keys/x509_evm.der
(-2)
[ 1.924647] integrity: Unable to open file: /etc/keys/x509_evm.der
(-2)
[ 1.991351] vio vio: uevent: failed to send synthetic uevent
[ 1.991351] vio vio: uevent: failed to send synthetic uevent
[ 2.524824] nouveau 0004:04:00.0: unknown chipset (140000a1)
[ 2.524824] nouveau 0004:04:00.0: unknown chipset (140000a1)
[ 2.525314] nouveau 0004:05:00.0: unknown chipset (140000a1)
[ 2.525314] nouveau 0004:05:00.0: unknown chipset (140000a1)
[ 2.525831] nouveau 0035:03:00.0: unknown chipset (140000a1)
[ 2.525831] nouveau 0035:03:00.0: unknown chipset (140000a1)
[ 2.526315] nouveau 0035:04:00.0: unknown chipset (140000a1)
[ 2.526315] nouveau 0035:04:00.0: unknown chipset (140000a1)
/dev/sda2: recovering journal
systemd-update-utmp.service
systemd-timesyncd.service
apparmor.service
[ 19.799936] kdump-tools[1192]: Starting kdump-tools: * running
makedumpfile -c -d 31 /proc/vmcore /var/crash/201904010617/dump-incomplete
Copying data : [100.0 %] -
eta: 0s
[ 84.327105] kdump-tools[1192]: The kernel version is not supported.
[ 84.327190] kdump-tools[1192]: The makedumpfile operation may be
incomplete.
[ 84.327269] kdump-tools[1192]: The dumpfile is saved to
/var/crash/201904010617/dump-incomplete.
[ 84.327347] kdump-tools[1192]: makedumpfile Completed.
[ 84.357362] kdump-tools[1192]: * kdump-tools: saved vmcore in
/var/crash/201904010617
[ 86.168085] kdump-tools[1192]: * running makedumpfile --dump-dmesg
/proc/vmcore /var/crash/201904010617/dmesg.201904010617
[ 86.207356] kdump-tools[1192]: The kernel version is not supported.
[ 86.207492] kdump-tools[1192]: The makedumpfile operation may be
incomplete.
[ 86.207564] kdump-tools[1192]: The dmesg log is saved to
/var/crash/201904010617/dmesg.201904010617.
[ 86.207818] kdump-tools[1192]: makedumpfile Completed.
[ 86.211403] kdump-tools[1192]: * kdump-tools: saved dmesg content
in /var/crash/201904010617
[ 86.350667] kdump-tools[1192]: Mon, 01 Apr 2019 06:18:47 -0400
[ 86.531296] kdump-tools[1192]: Rebooting.
== Comment: #4 - HARIHARAN T. SUNDARESH REDDY - 2019-04-01 23:43:05 ==
After kernel panic the host triggers kdump and reboots back.
== Comment: #13 - Leonardo Bras Soares Passos - 2019-05-15 11:51:33 ==
I have been doing a lot of tests lately, and found out this kind of bug usually
happens when disk IO is very intense, so that no other application can write on
disk.
I have studied libvirt and even proposed a patch on ML that solves the problem,
sleeping once in a while to make room for other applications to write on disk.
Unfortunately, i have made a mistake. This sort of problem should not be
responsibility of the application.
I got to know that it is the kernel responsibility to schedule IO access so
that no one starves. It is done using the Linux IO Scheduler.
The IO Scheduler priority seems to be set/get using ionice application.
The libvirt_iohelper is called with ionice = "none: prio 0".
journald also have ionice = "none: prio 0", and at times it seems to not be
able to interrupt libvirt_iohelper, causing the panic.
I have also crafted a simple test to mimic the behavior of libvirt_iohelper,
that could also recreate this bug. (attached)
It enforces the problem can happen independent to libvirt.
By now, I am looking for a way to understand the scheduler workings.
I believe it can be fixed only by adjusting ionice priorities.
If it's not possible, I will try to debug the kernel and see if I can
get any clue on why this happens, and a way to fix it.
== Comment: #15 - Leonardo Bras Soares Passos - 2019-05-29 16:14:09 ==
I was trying to better understand I/O priority, so I read a few documents and
found out the I/O Scheduler on this host is CFQ.
After reading a brief decription of Linux I/O CFQ, I understood it was
supposed to not let that starvation happen (as said before, both
journald and libvirt_iohelper had the same priority (none: prio 0)),
once it should create few queues per process and serve each queue using
priority to allocate a disk timeslice.
By now, I am trying to better understand the behavior of CFQ (block
/cfq-iosched.c), to see if is there a fix that could solve the problem.
While trying reproduce the problems on newer kernels (checking if the
bug was already fixed), I found out block/cfq-iosched.c was removed as
CFQ was deprecated in favor of BFQ. On Ubuntu kernel 5.0 the default
IOsched was replaced by mq-deadline. (hwe-18.04-edge)
The bug seems to reproduce much more on mq-deadline. As it is simpler, I
quickly studied it's workings:
The concept of deadline scheduler is that it prioritizes requests that expired
the maximum timeout,
Then, I created a systemtap program to better understand the reason of it
starving journald:
- It seems that the test program create write requests a lot faster than it's
possible to write on disk, because even after the program finishes executing,
mq-deadline keeps working for a few minutes.
- After a few seconds on writing, all requests seem to be falling on deadline,
i.e. they have a write only after timeout was reached.
- It means that, due to the large amount of data to write, the scheduler became
a simple FIFO.
About mq-deadline: It's important to furher debug his behavior, because
looks like Ubuntu kernels over 4.20 will use it as default IOscheduler.
There are a few more tests I want to try (like repeating the process
with a 90% full memory) in order to understand if less memory available
for caching IO changes will improve the behavior.
** Affects: libvirt (Ubuntu)
Importance: Undecided
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Status: New
** Tags: architecture-ppc64le bugnameltc-176572 severity-medium
targetmilestone-inin---
** Tags added: architecture-ppc64le bugnameltc-176572 severity-medium
targetmilestone-inin---
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1846237
Title:
Kernel Panic while virsh dump of Guest with 300G RAM is triggered.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1846237/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs