------- Comment From [email protected] 2018-05-08 16:37 EDT-------
Hit another instance of the RAM inconsistencies prior to resuming guest on 
target side (this one is migrating from boslcp6 to boslcp5 and crashing after 
it resumes execution on boslcp5). The signature is eerily similar to the ones 
above... the workload is blast from LTP but it's strange that 3 out of 3 so far 
have been the same data structure. Maybe there's a relationship between 
something the process is doing and dirty syncing?

root@boslcp5:~/vm_logs/1525768538/dumps# xxd -s 20250624 -l 128 
0-2.vm0.iteration2a
01350000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
root@boslcp5:~/vm_logs/1525768538/dumps# xxd -s 20250624 -l 128 
0-2.vm0.iteration2a.boslcp6
01350000: d603 0100 0000 0000 2f62 6c61 7374 2f76  ......../blast/v
01350010: 6463 3400 0000 0000 0000 0000 0000 0000  dc4.............
01350020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
01350070: 0000 0000 0000 0000 0000 0000 0000 0000  ................

For this run I included traces of the various stages of memory migration
on the QEMU side relative to dirty bitmap sync (attached above). The 3
phases are:

"ram_save_setup": enables dirty logging and sets up data structures used
for tracking dirty pages. does the initial bitmap sync. QEMU keeps it's
own copy which gets OR'd with the one provided by KVM on each bitmap
sync. There's 2 blocks (ram-node0/ram-node1) each with their own bitmap
/ KVM memslot since guest was defined with 2 NUMA nodes. Only ram-node0
would be relevant here since it has offset 0 in guest physical memory
address space.

"ram_save_pending": called before each iteration to see if there are
pages still pending. When number of dirty pages in the QEMU bitmap drop
below a certain value it does another sync with KVM's bitmap.

"ram_save_iterate": walks the QEMU dirty bitmap and sends corresponding
pages until there's none left or some other limit (e.g. bandwidth
throttling or max-pages-per-iteration) is hit.

"ram_save_pending"/"ram_save_iterate" keeps repeating until no more
pages are left.

"ram_save_complete" does a final sync with KVM bitmap, sends final set
of pages, then disables dirty logging and completes the migration.

"vm_stop" denotes with the guest VCPUs have all exited and stopped
execution.

There's 2 migrations reflected in the posted traces, the first one can
be ignored (everything between first ram_save_setup and first
ram_save_complete), it's just a backup of the VM. After that the VM is
backup up it resumes execution and that's the state we're migrating here
and seeing a crash with on other end.

The sequences of events in this run are comparable to previous
successful runs, no strange orderings or missed calls to sync with KVM
dirty bitmap/etc. The condensed version of the trace are below, but it
looks like there's a sync prior to vm_stop, and a sync afterward, and
given that these syncs are OR'd into a persistent bitmap maintained by
QEMU, there's shouldn't be any loss of dirty page information with this
particular ordering of events.

[email protected]: >ram_save_setup
[email protected]: migration_bitmap_sync, count: 4
[email protected]: qemu_global_log_sync
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: kvm_log_sync, addr: 0, size: 280000000
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: kvm_log_sync, addr: 280000000, size: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: kvm_log_sync, addr: 200080000000, size: 1000000
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: migration_bitmap_sync, id: ram-node0, 
block->mr->name: ram-node0, block->used_length: 280000000h
[email protected]: migration_bitmap_sync, id: ram-node1, 
block->mr->name: ram-node1, block->used_length: 280000000h
[email protected]: <ram_save_setup
[email protected]: >ram_save_pending, dirty pages remaining: 5247120, 
page size: 4096
[email protected]: <ram_save_pending, non_postcopiable_pending: 
21492203520 bytes
[email protected]: >ram_save_iterate, iteration: 0
[email protected]: <ram_save_iterate, pages sent: 848, done: 0
...
[email protected]: >ram_save_pending, dirty pages remaining: 4362032, 
page size: 4096
[email protected]: <ram_save_pending, non_postcopiable_pending: 
17866883072 bytes
[email protected]: >ram_save_iterate, iteration: 55318
[email protected]: <ram_save_iterate, pages sent: 976, done: 0

[email protected]: vm_stop

[email protected]: >ram_save_pending, dirty pages remaining: 4361056, 
page size: 4096
[email protected]: <ram_save_pending, non_postcopiable_pending: 
17862885376 bytes
[email protected]: >ram_save_iterate, iteration: 55379
[email protected]: <ram_save_iterate, pages sent: 832, done: 0
...
[email protected]: >ram_save_iterate, iteration: 325307
[email protected]: <ram_save_iterate, pages sent: 832, done: 0
[email protected]: >ram_save_pending, dirty pages remaining: 41376, page 
size: 4096
[email protected]: migration_bitmap_sync, count: 5
[email protected]: qemu_global_log_sync
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: kvm_log_sync, addr: 0, size: 280000000
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: kvm_log_sync, addr: 280000000, size: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: kvm_log_sync, addr: 200080000000, size: 1000000
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: migration_bitmap_sync, id: ram-node0, 
block->mr->name: ram-node0, block->used_length: 280000000h
[email protected]: migration_bitmap_sync, id: ram-node1, 
block->mr->name: ram-node1, block->used_length: 280000000h
[email protected]: <ram_save_pending, non_postcopiable_pending: 
7058665472 bytes
[email protected]: >ram_save_iterate, iteration: 325359
[email protected]: <ram_save_iterate, pages sent: 1120, done: 0
[email protected]: >ram_save_pending, dirty pages remaining: 1722187, 
page size: 4096
[email protected]: <ram_save_pending, non_postcopiable_pending: 
7054077952 bytes
...
[email protected]: >ram_save_iterate, iteration: 430453
[email protected]: <ram_save_iterate, pages sent: 832, done: 0
[email protected]: >ram_save_pending, dirty pages remaining: 41136, page 
size: 4096
[email protected]: migration_bitmap_sync, count: 6
[email protected]: qemu_global_log_sync
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: kvm_log_sync, addr: 0, size: 280000000
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: kvm_log_sync, addr: 280000000, size: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: kvm_log_sync, addr: 200080000000, size: 1000000
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: migration_bitmap_sync, id: ram-node0, 
block->mr->name: ram-node0, block->used_length: 280000000h
[email protected]: migration_bitmap_sync, id: ram-node1, 
block->mr->name: ram-node1, block->used_length: 280000000h
[email protected]: <ram_save_pending, non_postcopiable_pending: 
168493056 bytes
[email protected]: >ram_save_complete
[email protected]: migration_bitmap_sync, count: 7
[email protected]: qemu_global_log_sync
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: kvm_log_sync, addr: 0, size: 280000000
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: kvm_log_sync, addr: 280000000, size: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: kvm_log_sync, addr: 200080000000, size: 1000000
[email protected]: qemu_global_log_sync, name: ram-node0, addr: 0
[email protected]: qemu_global_log_sync, name: ram-node1, addr: 280000000
[email protected]: qemu_global_log_sync, name: vga.vram, addr: 80000000
[email protected]: migration_bitmap_sync, id: ram-node0, 
block->mr->name: ram-node0, block->used_length: 280000000h
[email protected]: migration_bitmap_sync, id: ram-node1, 
block->mr->name: ram-node1, block->used_length: 280000000h
[email protected]: migration_bitmap_sync, id: vga.vram, block->mr->name: 
vga.vram, block->used_length: 1000000h
[email protected]: migration_bitmap_sync, id: 
pci@800000020000000:01.0/virtio-net-pci.rom, block->mr->name: 
virtio-net-pci.rom, block->used_length: 40000h
[email protected]: migration_bitmap_sync, id: 
pci@800000020000000:03.0/virtio-net-pci.rom, block->mr->name: 
virtio-net-pci.rom, block->used_length: 40000h
[email protected]: migration_bitmap_sync, id: 
pci@800000020000000:00.0/vga.rom, block->mr->name: vga.rom, block->used_length: 
10000h
[email protected]: ram_save_complete, dirty pages remaining: 41136
[email protected]: <ram_save_complete, pages sent: 41136

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1768115

Title:
  ISST-LTE:KVM:Ubuntu1804:BostonLC:boslcp3g1: Migration guest running
  with IO stress crashed@security_file_permission+0xf4/0x160.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1768115/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to