Re: [Qemu-devel] anyone seen or heard of large delays/stalls running qemu with kvm support?

Dr. David Alan Gilbert Thu, 24 Aug 2017 02:11:26 -0700

* Chris Friesen (chris.frie...@windriver.com) wrote:
> 
> Hi all,
> 
> I need to apologize up front, this is going to be a long email.  Basically
> the executive summary is that we're seeing issues where a VM is apparently
> not making forward progress in the guest, while at the same time spinning in
> a busy loop on the host.  I'm looking for any pointers/suggestions on what
> might be going on or how to narrow it down.
> 
> We're using qemu-kvm-ev-2.6.0 in the context of an OpenStack compute node
> running CentOS 7.
> 
> While running a test involving a cold migration of a VM from one host to
> another (basically shutting down the guest, moving all the files over to
> another host, then starting the guest back up) we have on a couple of cases
> had a VM get into a state where it was displaying "Guest has not initialized
> the display (yet)." on the console.  Running strace or gdb on the host
> against the guest qemu process seems to allow it to make forward progress.


cc'ing in Paolo.


> Here's what I've got:
> 
> 1) The root disk seems to be fine.  The root disk for the VM is iSCSI, and
> is accessible from the host.  Starting up a new VM with the same root disk
> and mostly the same commandline works fine.  (The serial port of the new VM
> was directed to stdio and the vnc option was removed.)
> 
> 2) Investigating the apparently-hung instance it seems 2 of its 3 vCPUs are
> running a busy-loop:
> 
>    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 192577 root 20 0 3816972 38360 10916 R 99.9 0.0 75:15.17 CPU 1/KVM
> 192578 root -2 0 3816972 38360 10916 R 99.9 0.0 75:15.36 CPU 2/KVM
> 
> (Each of the vCPU threads is affined to a separate physical CPU.)
> 
> 
> 3) Trying to retrieve the cpu info via libvirt didn't return within 10 
> seconds:
> 
> compute-0:/home/wrsroot# start=$SECONDS; timeout 10 virsh
> qemu-monitor-command --hmp instance-0000006e info cpus || echo "timeout
> after $((SECONDS - start)) seconds"
> timeout after 10 seconds
> 
> 
> 4) Running gdb on 192577 shows the following backtrace:

How about getting a copy of the kernel stack for all the threads from
 /proc/(pid of qemu)/task/*/stack ?


> Thread 1 (process 192577):
> #0 0x00007f8c596e4537 in ioctl () from /lib64/libc.so.6
> #1 0x00007f8c70fe0234 in kvm_vcpu_ioctl ()
> #2 0x00007f8c70fe02ed in kvm_cpu_exec ()
> #3 0x00007f8c70fcee76 in qemu_kvm_cpu_thread_fn ()
> #4 0x00007f8c599bedc5 in start_thread () from /lib64/libpthread.so.0
> #5 0x00007f8c596ed76d in clone () from /lib64/libc.so.6
> 
> 5) Running gdb on 192578 gives the same backtrace.
> 
> 6) Strace shows most of the time spent in ioctl...with 39K syscalls in 30
> seconds across those two PIDs:

I've got vague memories of a recent bug report about something stuck
with the host eating CPU similarly, but I can't remember if it was
during boot or any other details.

> compute-0:/home/wrsroot# timeout 30 strace -c -q -T -p 192577 -p 192578
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  99.86 0.297753 9 32325 ioctl
>   0.12 0.000354 0 6730 writev
>   0.02 0.000059 0 384 60 futex
>   0.00 0.000000 0 10 write
>   0.00 0.000000 0 2 rt_sigpending
>   0.00 0.000000 0 2 2 rt_sigtimedwait
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.298166 39453 62 total
> 
> 
> 7) Running strace is enough to allow the VM to make forward progress, here
> are kernel logs from within the guest:
> 
> [ 2540.964492] INFO: rcu_preempt self-detected stall on CPU { 0} (t=2202996
> jiffies g=-108 c=-109 q=1676)
> [ 2540.984656] Task dump for CPU 0:
> [ 2540.985911] systemd R running task 0 1 0 0x00000008
> [ 2540.988028] ffff88003b660000 0000000084d8f1f4 ffff88003e003d90 
> ffffffff810a8676
> [ 2540.990869] 0000000000000000 ffffffff81a44d40 ffff88003e003da8 
> ffffffff810abed9
> [ 2540.993975] 0000000000000001 ffff88003e003dd8 ffffffff8110e1a0 
> ffff88003e00ec40
> [ 2540.996815] Call Trace:
> [ 2540.997995] <IRQ> [<ffffffff810a8676>] sched_show_task+0xb6/0x120
> [ 2540.999958] [<ffffffff810abed9>] dump_cpu_task+0x39/0x70
> [ 2541.001559] [<ffffffff8110e1a0>] rcu_dump_cpu_stacks+0x90/0xd0
> [ 2541.002425] [<ffffffff81111e34>] rcu_check_callbacks+0x434/0x7f0
> [ 2541.003154] [<ffffffff810d6590>] ? tick_sched_handle.isra.13+0x60/0x60
> [ 2541.003981] [<ffffffff8107f507>] update_process_times+0x47/0x80
> [ 2541.004755] [<ffffffff810d6555>] tick_sched_handle.isra.13+0x25/0x60
> [ 2541.005528] [<ffffffff810d65d1>] tick_sched_timer+0x41/0x70
> [ 2541.006225] [<ffffffff81099c4c>] __hrtimer_run_queues+0xfc/0x320
> [ 2541.006976] [<ffffffff8109a2b0>] hrtimer_interrupt+0xb0/0x1e0
> [ 2541.007751] [<ffffffff810421b7>] local_apic_timer_interrupt+0x37/0x60
> [ 2541.008541] [<ffffffff816d7b3f>] smp_apic_timer_interrupt+0x3f/0x60
> [ 2541.009290] [<ffffffff816d621d>] apic_timer_interrupt+0x6d/0x80
> [ 2541.009996] <EOI> [<ffffffff810ccd39>] ? get_monotonic_boottime+0xb9/0x100
> [ 2541.010976] [<ffffffff81093cd1>] posix_get_boottime+0x11/0x20
> [ 2541.011740] [<ffffffff810952e4>] SyS_clock_gettime+0x54/0xc0
> [ 2541.012437] [<ffffffff816d5649>] system_call_fastpath+0x16/0x1b
> 
> [ 2602.140152] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> [systemd-udevd:232]
> [ 2602.140438] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
> [ksoftirqd/0:3]
> [[ 2602.140455] Modules linked in: ata_piix(+) serio_raw virtio_blk(+)
> virtio_net floppy(+) libata sunrpc
> [32m OK [0m] Started udev Coldplug all Devices.
> [ 2602.140457] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
> 3.10.0-514.16.1.el7.24.tis.x86_64 #1
> [ 2602.140458] Hardware name: Fedora Project OpenStack Nova, BIOS
> 1.9.1-5.el7_3.2.tis.1 04/01/2014

Hmm.  There were a pair of fixes that went into seabios and my reading
of our release history is that they're in the build just after that
(1.9.1-5.el7_3.3)
They could cause occasional boot hangs in odd ways - but your symptoms
dont quite seem to fit what I saw when I found them; but there again
they were pretty random.

Dave

> [ 2602.140472] task: ffff88003b661620 ti: ffff88003b6a0000 task.ti: 
> ffff88003b6a0000
> [ 2602.140478] RIP: 0010:[<ffffffff8110f765>] [<ffffffff8110f765>]
> rcu_process_gp_end+0x15/0x80
> [ 2602.140488] RSP: 0018:ffff88003b6a3d70 EFLAGS: 00000202
> [ 2602.140489] RAX: 0000000000000202 RBX: 00000000000120dd RCX: 
> ffffffff81ae2f80
> [ 2602.140490] RDX: ffff88003e000000 RSI: ffff88003e00ec40 RDI: 
> ffffffff81a44d40
> [ 2602.140490] RBP: ffff88003b6a3d90 R08: ffff88003f14f600 R09: 
> 0000000180100005
> [ 2602.140491] R10: ffffffff81a44d40 R11: 0000000000000000 R12: 
> ffff88003b661688
> [ 2602.140491] R13: ffffffff810c21c1 R14: ffff88003b6a3cf8 R15: 
> 0000000180100005
> [ 2602.140492] FS: 0000000000000000(0000) GS:ffff88003e000000(0000)
> knlGS:0000000000000000
> [ 2602.140493] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2602.140494] CR2: 00007fea58b37628 CR3: 000000003b068000 CR4: 
> 00000000000006f0
> [ 2602.140504] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [ 2602.140505] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [ 2602.140505] Stack:
> [ 2602.140516] ffffffff81a44d40 0000000000000009 ffff88003cc02300 
> ffffffff819f7108
> [ 2602.140517] ffff88003b6a3df0 ffffffff8111001e ffff88003aa52e30 
> 0000000000000000
> [ 2602.140518] ffff88003e00ef88 ffff88003b661620 ffff88003b6a3fd8 
> 0000000000000009
> [ 2602.140524] Call Trace:
> [ 2602.140534] [<ffffffff8111001e>] rcu_process_callbacks+0x9e/0x5c0
> [ 2602.140545] [<ffffffff81075ba0>] __do_softirq+0x100/0x300
> [ 2602.140546] [<ffffffff81075dd8>] run_ksoftirqd+0x38/0x50
> [ 2602.140555] [<ffffffff8109e62f>] smpboot_thread_fn+0x12f/0x240
> [ 2602.140557] [<ffffffff8109e500>] ? lg_double_unlock+0xb0/0xb0
> [ 2602.140559] [<ffffffff8109596f>] kthread+0xcf/0xe0
> [ 2602.140569] [<ffffffff810958a0>] ? kthread_create_on_node+0x140/0x140
> [ 2602.140572] [<ffffffff816d5598>] ret_from_fork+0x58/0x90
> [ 2602.140574] [<ffffffff810958a0>] ? kthread_create_on_node+0x140/0x140
> [ 2602.140595] Code: 00 31 ff e8 fe ed ff ff 48 89 df 57 9d 0f 1f 44 00 00
> 5b 5d c3 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 f4 53 9c 58
> <0f> 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 4c 8b 6e 18 49 8b
> [ 2602.146398] FDC 0 is a S82078B
> [ 2602.290420] Modules linked in: ata_piix(+) serio_raw virtio_blk(+)
> virtio_net floppy libata sunrpc
> [ 2602.290422] CPU: 2 PID: 232 Comm: systemd-udevd Tainted: G L ------------
> 3.10.0-514.16.1.el7.24.tis.x86_64 #1
> [ 2602.290422] Hardware name: Fedora Project OpenStack Nova, BIOS
> 1.9.1-5.el7_3.2.tis.1 04/01/2014
> [ 2602.290423] task: ffff88003faeeea0 ti: ffff88003fad4000 task.ti: 
> ffff88003fad4000
> [ 2602.290428] RIP: 0010:[<ffffffff810dcd2a>] [<ffffffff810dcd2a>]
> smp_call_function_many+0x20a/0x270
> [ 2602.290428] RSP: 0018:ffff88003fad7740 EFLAGS: 00000202
> [ 2602.290429] RAX: 0000000000000000 RBX: ffff88003e116280 RCX: 
> ffff88003e018fd8
> [ 2602.290430] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 
> 0000000000000000
> [ 2602.290430] RBP: ffff88003fad7778 R08: ffff88003e116288 R09: 
> 0000000000000004
> [ 2602.290431] R10: ffff88003e116288 R11: 0000000000000000 R12: 
> ffffffff81ae2f80
> [ 2602.290431] R13: 000c00003e341958 R14: 00000000000000fc R15: 
> 0000000000000292
> [ 2602.290432] FS: 00007fb91a4648c0(0000) GS:ffff88003e100000(0000)
> knlGS:0000000000000000
> [ 2602.290433] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2602.290434] CR2: 00007febbb741000 CR3: 000000003f921000 CR4: 
> 00000000000006e0
> [ 2602.290436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [ 2602.290436] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [ 2602.290437] Stack:
> [ 2602.290438] 0000000100000001 0000000000016240 ffffffff81e1aa00 
> 0000000000000000
> [ 2602.290439] ffffffff81154e30 0000000000000002 0000000000000286 
> ffff88003fad77a8
> [ 2602.290440] ffffffff810dcfaa 0000000000000002 0000000000000100 
> 0000000000000100
> [ 2602.290441] Call Trace:
> [ 2602.290444] [<ffffffff81154e30>] ? drain_pages+0xb0/0xb0
> [ 2602.290446] [<ffffffff810dcfaa>] on_each_cpu_mask+0x3a/0xa0
> [ 2602.290448] [<ffffffff811521d5>] drain_all_pages+0xb5/0xc0
> [ 2602.290450] [<ffffffff811b9d0f>] set_migratetype_isolate+0xaf/0xf0
> [ 2602.290452] [<ffffffff811b9e90>] start_isolate_page_range+0x70/0xf0
> [ 2602.290453] [<ffffffff81155632>] alloc_contig_range+0xe2/0x320
> [ 2602.290458] [<ffffffff814130af>] dma_alloc_from_contiguous+0xaf/0x200
> [ 2602.290461] [<ffffffff81021238>] dma_generic_alloc_coherent+0x88/0x1c0
> [ 2602.290463] [<ffffffff8141bd42>] dmam_alloc_coherent+0xb2/0x140
> [ 2602.290471] [<ffffffffa007d3c4>] ata_bmdma_port_start+0x44/0x60 [libata]
> [ 2602.290474] [<ffffffffa00f034b>] piix_port_start+0x1b/0x20 [ata_piix]
> [ 2602.290478] [<ffffffffa0069185>] ata_host_start.part.32+0xf5/0x1e0 [libata]
> [ 2602.290483] [<ffffffffa007e6c0>] ? ata_bmdma_port_intr+0x120/0x120 [libata]
> [ 2602.290486] [<ffffffffa0069289>] ata_host_start+0x19/0x20 [libata]
> [ 2602.290491] [<ffffffffa007cfe5>] ata_pci_sff_activate_host+0x45/0x250 
> [libata]
> [ 2602.290493] [<ffffffff8134a0ac>] ? pci_write_config_byte+0x1c/0x20
> [ 2602.290496] [<ffffffff81352226>] ? pcibios_set_master+0x76/0xa0
> [ 2602.290498] [<ffffffffa00f0e9e>] piix_init_one+0x57e/0x975 [ata_piix]
> [ 2602.290500] [<ffffffff81353935>] local_pci_probe+0x45/0xa0
> [ 2602.290501] [<ffffffff81354c75>] ? pci_match_device+0xe5/0x120
> [ 2602.290503] [<ffffffff81354df1>] pci_device_probe+0x101/0x150
> [ 2602.290505] [<ffffffff8140bac5>] driver_probe_device+0x145/0x3c0
> [ 2602.290507] [<ffffffff8140be13>] __driver_attach+0x93/0xa0
> [ 2602.290509] [<ffffffff8140bd80>] ? __device_attach+0x40/0x40
> [ 2602.290510] [<ffffffff81409723>] bus_for_each_dev+0x73/0xc0
> [ 2602.290511] [<ffffffff8140b40e>] driver_attach+0x1e/0x20
> [ 2602.290513] [<ffffffff8140af60>] bus_add_driver+0x200/0x2d0
> [ 2602.290514] [<ffffffff8140c494>] driver_register+0x64/0xf0
> [ 2602.290516] [<ffffffff813548d5>] __pci_register_driver+0xa5/0xc0
> [ 2602.290517] [<ffffffffa00f8000>] ? 0xffffffffa00f7fff
> [ 2602.290519] [<ffffffffa00f801e>] piix_init+0x1e/0x2e [ata_piix]
> [ 2602.290521] [<ffffffff810003a8>] do_one_initcall+0xb8/0x230
> [ 2602.290523] [<ffffffff810e3201>] load_module+0x2171/0x2810
> [ 2602.290525] [<ffffffff81337380>] ? ddebug_proc_write+0xf0/0xf0
> [ 2602.290528] [<ffffffff810df3e3>] ? copy_module_from_fd.isra.46+0x53/0x150
> [ 2602.290529] [<ffffffff810e3a56>] SyS_finit_module+0xa6/0xd0
> [ 2602.290531] [<ffffffff816d5649>] system_call_fastpath+0x16/0x1b
> 
> 
> 8) Running strace in a loop doesn't reliably kick qemu-kvm to run some more
> guest code. Found another method that works better:
> 
> cpu_kvm_busy=$(top -b -n1 -H -o+%CPU -p $(pgrep qemu-kvm) | awk '/KVM/ &&
> $9>50 {print $1;}'); while true; do for tid in $cpu_kvm_busy; do gdb --batch
> --quiet -ex 'bt' -ex 'quit' /usr/libexec/qemu-kvm $tid; done; done
> 
> This basically just finds qemu threads that are using more than 50% CPU,
> attaches gdb to them, prints stack trace, and exits.
> 
> With this running the instance can boot up to the login prompt on the vnc
> console and access the rootfs.
> 
> 
> Thanks for reading this far...if you have any pointers/suggestions I'd love
> to hear them.
> 
> Chris
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] anyone seen or heard of large delays/stalls running qemu with kvm support?

Reply via email to