** Description changed: + [ Impact ] + + QEMU users on Jammy may experience recurrent QMP timeouts, which can + eventually impact their ability to perform live VM migrations using + postcopy when the VM is using a qcow2 disk image attached. + + [ Test plan ] + + We can use the test case provided by the reporter, which can be checked + in the "Original Description" section below. + + [ Where problems could occur ] + + The backported patch looks big, but you can see that the changes to the + actual QMP code are small and self-contained; the rest of the patch + implements testcases for the fix. + + As the Impact section outlines, this bug is affecting a specific + scenario which involves live migration, which is always something that + needs to be considered carefully. Despite the patch being small, its + rationale is complex and involves running coroutines in asynchronous I/O + contexts. I believe this is the most "dangerous" part of it. + + Having said that, the fix has been present in upstream QEMU for 1 year, + having been part of multiple releases, without any apparent issues. + This is also a testament to its correctness. + + [ Original Description ] + Hello, Please backport the following upstream commit that fixes reccurent QMP timeouts : https://gitlab.com/qemu- project/qemu/-/commit/effd60c878176bcaf97fa7ce2b12d04bb8ead6f7 This has been fixed in Noble and newer releases. Here is a reproducer to help identify the fix. Details on its usage is in comments #!/usr/bin/env python3 ############################################################################## # Reproducer script for QEMU hang in snapshot at boot # # Requires: `qemu.qmp` python package # # Fix: effd60c878176bcaf97fa7ce2b12d04bb8ead6f7 # # Description: # # Linux appears to write _something_ to the UEFI variables at boot. If a qmp # # transaction is stated during the write operation it can deadlock qemu. # ############################################################################## ############################## # Expected backtrace # # [...] # # aio_poll # # [...] # # qmp_blockdev_snapshot_sync # # [...] # # aio_poll # # [...] # # pflash_write # # [...] # ############################## ######################## Setup ######################### # $ pip install qemu.qmp # $ cp /usr/share/OVMF/OVMF_VARS_4M.fd /tmp/vars.fd # $ wget https://github.com/cirros-dev/cirros/releases/download/0.6.3/cirros-0.6.3-x86_64-rootfs.img.gz # $ gunzip cirros-0.6.3-x86_64-rootfs.img.gz # $ qemu-img convert -f raw -O qcow2 cirros-0.6.3-x86_64-rootfs.img /tmp/disk.qcow2 # $ rm -f cirros-0.6.3-x86_64-rootfs.img ######################################################## import asyncio import logging import os import subprocess from qemu.qmp import QMPClient SOCKET = '/tmp/qmp-deadlock.sock' FW = '/usr/share/OVMF/OVMF_CODE_4M.fd' DISK = '/tmp/disk.qcow2' SNAP_FILE = '/tmp/snap.qcow2' FW_VARS = '/tmp/vars.fd' async def spawn_qemu(): blk_args = [ 'driver=qcow2', 'node-name=snap-disk', 'file.driver=file', f'file.filename={DISK}', ] cmd = [ 'qemu-system-x86_64', '-qmp', f'unix:{SOCKET}', '-blockdev', ','.join(blk_args), '-device', 'virtio-blk,drive=snap-disk', '-drive', f'if=pflash,format=raw,readonly=on,file={FW}', '-drive', f'if=pflash,format=raw,file={FW_VARS}', '-m', '1G', '-nographic', #'-enable-kvm', ] #print(' '.join(cmd)) return await asyncio.create_subprocess_exec( *cmd, stdin=subprocess.DEVNULL, stdout=subprocess.DEVNULL, ) async def snap_rollback(qmp): await qmp.execute('blockdev-snapshot-sync', { 'node-name': 'snap-disk', 'snapshot-node-name': 'snap', 'snapshot-file': SNAP_FILE, }) with qmp.listener('BLOCK_JOB_READY') as listener: await qmp.execute('block-commit', { 'device': 'snap', 'job-id': 'commit', }) async for event in listener: if event.get('data', {}).get('device') == 'commit': break with qmp.listener('BLOCK_JOB_COMPLETED') as listener: await qmp.execute('block-job-complete', { 'device': 'commit', }) async for event in listener: if event.get('data', {}).get('device') == 'commit': break async def qmp_main(qmp): while True: await asyncio.wait_for(snap_rollback(qmp), timeout=15) async def main(): #logging.basicConfig(level=logging.DEBUG) qmp = QMPClient('test-deadlock') await qmp.start_server(SOCKET) qemu, _ = await asyncio.gather(spawn_qemu(), qmp.accept()) print(f'qemu pid: {qemu.pid}') try: await qmp_main(qmp) except asyncio.TimeoutError: print("QMP timeout, exiting") finally: try: await qmp.disconnect() finally: qemu.kill() await qemu.wait() asyncio.run(main()) A debdiff with the patch will come shortly
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2091013 Title: Please backport upstream fix to qmp timeouts To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2091013/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
