Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll

Fiona Ebner Fri, 28 Apr 2023 00:25:57 -0700

Am 27.04.23 um 16:36 schrieb Juan Quintela:
> Fiona Ebner <f.eb...@proxmox.com> wrote:
>> Am 27.04.23 um 13:03 schrieb Kevin Wolf:
>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben:
>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini:
> 
> Hi
> 
>> Our function is a custom variant of saving a snapshot and uses
>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread()
>> is there. I looked for inspiration for how upstream does things and it
>> turns out that upstream QEMU v8.0.0 has essentially the same issue with
>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead
>> of the main thread, the situation is the same: after
>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return
>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails
>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0].
>>
>>
>> So all bottom halves scheduled for the main thread's AioContext can
>> potentially get to run in a vCPU thread and need to be very careful with
>> things like qemu_mutex_unlock_iothread.
>>
>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't
>> looked into why it happens yet. Does there need to be a way to drop the
>> BQL without also giving up the main thread's AioContext or would it be
>> enough to re-acquire the context?
>>
>> CC-ing Juan as the migration maintainer.
> 
> This is the world backwards.
> The tradition is that migration people blame block layer people for
> breaking things and for help, not the other way around O:-)


Sorry, if I didn't provide enough context/explanation. See below for my
attempt to re-iterate. I CC'ed you, because the issue happens as part of
snapshot-save and in particular the qemu_mutex_unlock_iothread call in
qemu_savevm_state is one of the ingredients leading to the problem.

> 
>> Best Regards,
>> Fiona
>>
>> [0]:
>>> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted.
>>> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>>> 50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>> (gdb) bt
>>> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>>> #1  0x00007f9027b3e537 in __GI_abort () at abort.c:79
>>> #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8
>>> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>>> assertion=0x558ed44fcec0 "qemu_get_current_aio_context() ==
>>> qemu_get_aio_context()", file=0x558ed44fce80
>>> "/home/febner/repos/qemu/block/block-gen.h", line=43,
>>> function=<optimized out>) at assert.c:92
>>> #3 0x00007f9027b4d662 in __GI___assert_fail
>>> (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() ==
>>> qemu_get_aio_context()", file=0x558ed44fce80
>>> "/home/febner/repos/qemu/block/block-gen.h", line=43,
>>> function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at
>>> assert.c:101
>>> #4  0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at 
>>> /home/febner/repos/qemu/block/block-gen.h:43
>>> #5  0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, 
>>> qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809
>>> #6 0x0000558ed3df36d0 in qio_channel_block_writev
>>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0,
>>> flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89
>>> #7 0x0000558ed40feedb in qio_channel_writev_full
>>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0,
>>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108
>>> #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all
>>> (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0,
>>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263
>>> #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060,
>>> iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at
>>> ../io/channel.c:242
>>> #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at 
>>> ../migration/qemu-file.c:302
>>> #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, 
>>> opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302
>>> #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at 
>>> ../migration/savevm.c:1266
>>> #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, 
>>> errp=0x558ed68c5238) at ../migration/savevm.c:1626
>>> #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790
>>> "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0",
>>> has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at
>>> ../migration/savevm.c:2954
>>> #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at 
>>> ../migration/savevm.c:3253
>>> #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at 
>>> ../util/async.c:155
>>> #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at 
>>> ../util/async.c:184
>>> #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at 
>>> ../util/aio-posix.c:721
>>> #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at 
>>> /home/febner/repos/qemu/block/block-gen.h:43
>>> #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60,
>>> offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at
>>> block/block-gen.c:1650
>>> #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, 
>>> size=1) at ../hw/block/pflash_cfi01.c:394
>>> #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30,
>>> offset=231232, value=0, width=1, be=0) at
>>> ../hw/block/pflash_cfi01.c:522
>>> #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs
>>> (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at
>>> ../hw/block/pflash_cfi01.c:681
>>> #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor
>>> (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1,
>>> shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514
>>> #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232,
>>> value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4,
>>> access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>,
>>> mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555
>>> #26 0x0000558ed402d5de in memory_region_dispatch_write
>>> (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at
>>> ../softmmu/memory.c:1522
>>> #27 0x0000558ed403a6f4 in flatview_write_continue
>>> (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028,
>>> len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at
>>> ../softmmu/physmem.c:2641
>>> #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0,
>>> addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at
>>> ../softmmu/physmem.c:2683
>>> #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20
>>> <address_space_memory>, addr=4291004224, attrs=...,
>>> buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779
>>> #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20
>>> <address_space_memory>, addr=4291004224, attrs=...,
>>> buf=0x7f9029957028, len=1, is_write=true) at
>>> ../softmmu/physmem.c:2789
>>> #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at 
>>> ../accel/kvm/kvm-all.c:2989
>>> #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at 
>>> ../accel/kvm/kvm-accel-ops.c:51
>>> #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at 
>>> ../util/qemu-thread-posix.c:541
>>> #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at 
>>> pthread_create.c:477
>>> #35 0x00007f9027c18a2f in clone () at 
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> To see that I am understading this right:
> 
> - you create a thread
> - that calls a memory_region operation
> - that calls a device write function
> - that calls the block layer
> - that creates a snapshot
> - that calls the migration code
> - that calls the block layer again
> 
> Without further investigation, I have no clue what is going on here,
> sorry.
> 
> Later, Juan.
> 

All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended
ways, I promise! In particular, I'm doing two things at the same time
repeatedly:
1. Write to a pflash drive from within the guest.
2. Issue a snapshot-save QMP command (in a way that doesn't lead to an
early error).

(I actually also used a debugger to break on pflash_update and
snapshot_save_job_bh, manually continuing until I triggered the
problematic situation. It's very racy, because it depends on the host OS
to switch threads at the correct time.)

Now we need to be aware of two things:
1. As discussed earlier in the mail thread, if the host OS switches
threads at an inconvenient time, it can happen that a bottom half
scheduled for the main thread's AioContext can be executed as part of a
vCPU thread's aio_poll.
2. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish.

What happens in the backtrace above is:
1. The write to the pflash drive uses blk_pwrite which leads to an
aio_poll in the vCPU thread.
2. The snapshot_save_job_bh bottom half, that was scheduled for the main
thread's AioContext, is executed as part of the vCPU thread's aio_poll.
3. qemu_savevm_state is called.
4. qemu_mutex_unlock_iothread is called. Now
qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh
runs in the main thread, in which case qemu_get_current_aio_context
still returns the main thread's AioContext at this point.
5. bdrv_writev_vmstate is executed as part of the usual savevm setup.
6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses
AIO_WAIT_WHILE.
7. The assertion to have the main thread's AioContext inside the
AIO_WAIT_WHILE macro fails.

Best Regards,
Fiona

Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll

Reply via email to