** Description changed:
[Impact]
drive-mirror, blockdev-mirror, and active blockcommit can silently lose guest
writes issued during a short window at job startup.
The destination/base image keeps stale data and the job pivots with no error.
It's silent data corruption.
mirror_start_job() disables the block-layer dirty bitmap before the mirror
filter's own tracking is live, so writes in that window are tracked by neither
(block/mirror.c).
Regression since QEMU 8.0.0, introduced by commit 32125b14606a ('mirror: Fix
access of uninitialised fields during start').
Upstream issue, with MySQL/PostgreSQL corruption reported by production users
on the standard `virsh blockcommit --active --pivot` flow:
- https://gitlab.com/qemu-project/qemu/-/issues/3273
+ https://gitlab.com/qemu-project/qemu/-/issues/3273
Fixed upstream in 0f51f9c3420b, backported to the qemu-stable tree as
61f14858c159.
[Test Plan]
- Not a guest test: QEMU runs with -machine none (no VM/OS booted).
+ Not a guest test: QEMU runs with -machine none (no VM/OS booted).
The startup race is reproduced by injecting a controlled QMP/blkdebug/NBD
sequence, not by guest I/O.
- # 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
- qemu-img create -f qcow2 base.qcow2 64G
- qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
+ # 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
+ qemu-img create -f qcow2 base.qcow2 64G
+ qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
- # 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init()
scanning past offset 0
- for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done |
qemu-io -f qcow2 top.qcow2 >/dev/null
+ # 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init()
scanning past offset 0
+ for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done |
qemu-io -f qcow2 top.qcow2 >/dev/null
- # 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read
replies/events from qmp.out
- mkfifo qmp.in
- qemu-system-x86_64 -nodefaults -machine none -display none -monitor none -qmp
stdio -drive
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2
<qmp.in >qmp.out 2>qemu.err &
- qemu_pid=$!
- exec 3>qmp.in
- echo '{"execute":"qmp_capabilities"}' >&3
+ # 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read
replies/events from qmp.out
+ mkfifo qmp.in
+ qemu-system-x86_64 -nodefaults -machine none -display none -monitor none
-qmp stdio -drive
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2
<qmp.in >qmp.out 2>qemu.err &
+ qemu_pid=$!
+ exec 3>qmp.in
+ echo '{"execute":"qmp_capabilities"}' >&3
- # 4. writable NBD export on the active node, BEFORE block-commit so the
mirror filter takes over its writes
- echo
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
>&3
- echo
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
>&3
+ # 4. writable NBD export on the active node, BEFORE block-commit so the
mirror filter takes over its writes
+ echo
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
>&3
+ echo
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
>&3
- # 5. arm the blkdebug breakpoint, then start the active commit
- echo '{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io
drive0 \"break l2_load A\""}}' >&3
- # active commit of drive0's whole backing chain (no base/top passed)
- echo
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
>&3
+ # 5. arm the blkdebug breakpoint, then start the active commit
+ echo
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0
\"break l2_load A\""}}' >&3
+ # active commit of drive0's whole backing chain (no base/top passed)
+ echo
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
>&3
- # 6. wait until dirty_init hits l2_load (offset 0 already scanned); its
wait_break reply carries id "wb"
- echo
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
drive0 \"wait_break A\""}}' >&3
- until grep -q '"wb"' qmp.out; do sleep 0.2; done
+ # 6. wait until dirty_init hits l2_load (offset 0 already scanned); its
wait_break reply carries id "wb"
+ echo
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
drive0 \"wait_break A\""}}' >&3
+ until grep -q '"wb"' qmp.out; do sleep 0.2; done
- # 7. write into the still-open startup window (job not installed yet); it
blocks until resume, so background it
- qemu-io -f raw -c "write -P 0x7b 0 64k"
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
+ # 7. write into the still-open startup window (job not installed yet); it
blocks until resume, so background it
+ qemu-io -f raw -c "write -P 0x7b 0 64k"
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
- # 8. give the background write time to connect and block on the breakpoint,
then resume
- sleep 1
- echo '{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io
drive0 \"resume A\""}}' >&3
- wait $!
+ # 8. give the background write time to connect and block on the breakpoint,
then resume
+ sleep 1
+ echo
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0
\"resume A\""}}' >&3
+ wait $!
- # 9. finish the active commit, pivot, quit
+ # 9. finish the active commit, pivot, quit
until grep -q BLOCK_JOB_READY qmp.out; do sleep 0.2; done
- echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
+ echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
until grep -q BLOCK_JOB_COMPLETED qmp.out; do sleep 0.2; done
- echo '{"execute":"quit"}' >&3
- wait "$qemu_pid"
+ echo '{"execute":"quit"}' >&3
+ wait "$qemu_pid"
- # 10. check the committed base directly
- qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
- qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
+ # 10. check the committed base directly
+ qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
+ qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
- The 0x11 read is a control (always passes: the commit copied normal data).
- The 0x7b read at offset 0 is the verdict:
+ The 0x11 read is a control (always passes: the commit copied normal data).
+ The 0x7b read at offset 0 is the verdict:
- before patch: fails -- base reads 0x00, the window write was lost
- after patch: succeeds -- the window write reached base
+ before patch: fails -- base reads 0x00, the window write was lost
+ after patch: succeeds -- the window write reached base
[Where problems could occur]
The change is in mirror_start_job() and the per-write hot path
bdrv_mirror_top_do_write(), shared by drive-mirror, blockdev-mirror and active
block-commit.
A regression would therefore affect any such job (libvirt
blockcommit/blockcopy, live storage migration), not only the startup window
being fixed.
- Bitmap lifecycle moved: the mirror bitmap is now created right after
bdrv_append() and released on the job-start failure path, so a mistake there
could leak the bitmap or free it twice.
- Bitmap create/disable now runs inside the drained section, reordered
against bdrv_append() and job creation; wrong ordering could race with
in-flight requests.
- An out-of-tree block driver reading the bitmap during the drain interval
would see the new ordering; no in-tree caller does.
- Noble (qemu 8.2.2) needs a manual backport because surrounding code
shifted, so divergence from upstream is the risk; it is re-verified with the
same deterministic Test Plan above.
[Other Info]
** Description changed:
[Impact]
drive-mirror, blockdev-mirror, and active blockcommit can silently lose guest
writes issued during a short window at job startup.
The destination/base image keeps stale data and the job pivots with no error.
It's silent data corruption.
mirror_start_job() disables the block-layer dirty bitmap before the mirror
filter's own tracking is live, so writes in that window are tracked by neither
(block/mirror.c).
Regression since QEMU 8.0.0, introduced by commit 32125b14606a ('mirror: Fix
access of uninitialised fields during start').
Upstream issue, with MySQL/PostgreSQL corruption reported by production users
on the standard `virsh blockcommit --active --pivot` flow:
https://gitlab.com/qemu-project/qemu/-/issues/3273
Fixed upstream in 0f51f9c3420b, backported to the qemu-stable tree as
61f14858c159.
[Test Plan]
Not a guest test: QEMU runs with -machine none (no VM/OS booted).
The startup race is reproduced by injecting a controlled QMP/blkdebug/NBD
sequence, not by guest I/O.
- # 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
- qemu-img create -f qcow2 base.qcow2 64G
- qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
+ 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
+ qemu-img create -f qcow2 base.qcow2 64G
+ qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
- # 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init()
scanning past offset 0
- for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done |
qemu-io -f qcow2 top.qcow2 >/dev/null
+ 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init()
scanning past offset 0
+ for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done |
qemu-io -f qcow2 top.qcow2 >/dev/null
- # 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read
replies/events from qmp.out
- mkfifo qmp.in
- qemu-system-x86_64 -nodefaults -machine none -display none -monitor none
-qmp stdio -drive
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2
<qmp.in >qmp.out 2>qemu.err &
- qemu_pid=$!
- exec 3>qmp.in
- echo '{"execute":"qmp_capabilities"}' >&3
+ 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read
replies/events from qmp.out
+ mkfifo qmp.in
+ qemu-system-x86_64 -nodefaults -machine none -display none -monitor none
-qmp stdio -drive
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2
<qmp.in >qmp.out 2>qemu.err &
+ qemu_pid=$!
+ exec 3>qmp.in
+ echo '{"execute":"qmp_capabilities"}' >&3
- # 4. writable NBD export on the active node, BEFORE block-commit so the
mirror filter takes over its writes
- echo
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
>&3
- echo
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
>&3
+ 4. writable NBD export on the active node, BEFORE block-commit so the mirror
filter takes over its writes
+ echo
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
>&3
+ echo
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
>&3
- # 5. arm the blkdebug breakpoint, then start the active commit
- echo
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0
\"break l2_load A\""}}' >&3
- # active commit of drive0's whole backing chain (no base/top passed)
- echo
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
>&3
+ 5. arm the blkdebug breakpoint, then start the active commit
+ echo
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0
\"break l2_load A\""}}' >&3
+ # active commit of drive0's whole backing chain (no base/top passed)
+ echo
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
>&3
- # 6. wait until dirty_init hits l2_load (offset 0 already scanned); its
wait_break reply carries id "wb"
- echo
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
drive0 \"wait_break A\""}}' >&3
- until grep -q '"wb"' qmp.out; do sleep 0.2; done
+ 6. wait until dirty_init hits l2_load (offset 0 already scanned); its
wait_break reply carries id "wb"
+ echo
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
drive0 \"wait_break A\""}}' >&3
+ until grep -q '"wb"' qmp.out; do sleep 0.2; done
- # 7. write into the still-open startup window (job not installed yet); it
blocks until resume, so background it
- qemu-io -f raw -c "write -P 0x7b 0 64k"
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
+ 7. write into the still-open startup window (job not installed yet); it
blocks until resume, so background it
+ qemu-io -f raw -c "write -P 0x7b 0 64k"
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
- # 8. give the background write time to connect and block on the breakpoint,
then resume
- sleep 1
- echo
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0
\"resume A\""}}' >&3
- wait $!
+ 8. give the background write time to connect and block on the breakpoint,
then resume
+ sleep 1
+ echo
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0
\"resume A\""}}' >&3
+ wait $!
- # 9. finish the active commit, pivot, quit
+ 9. finish the active commit, pivot, quit
until grep -q BLOCK_JOB_READY qmp.out; do sleep 0.2; done
- echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
+ echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
until grep -q BLOCK_JOB_COMPLETED qmp.out; do sleep 0.2; done
- echo '{"execute":"quit"}' >&3
- wait "$qemu_pid"
+ echo '{"execute":"quit"}' >&3
+ wait "$qemu_pid"
- # 10. check the committed base directly
- qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
- qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
+ 10. check the committed base directly
+ qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
+ qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
- The 0x11 read is a control (always passes: the commit copied normal data).
- The 0x7b read at offset 0 is the verdict:
+ The 0x11 read is a control (always passes: the commit copied normal data).
+ The 0x7b read at offset 0 is the verdict:
before patch: fails -- base reads 0x00, the window write was lost
after patch: succeeds -- the window write reached base
[Where problems could occur]
The change is in mirror_start_job() and the per-write hot path
bdrv_mirror_top_do_write(), shared by drive-mirror, blockdev-mirror and active
block-commit.
A regression would therefore affect any such job (libvirt
blockcommit/blockcopy, live storage migration), not only the startup window
being fixed.
- Bitmap lifecycle moved: the mirror bitmap is now created right after
bdrv_append() and released on the job-start failure path, so a mistake there
could leak the bitmap or free it twice.
- Bitmap create/disable now runs inside the drained section, reordered
against bdrv_append() and job creation; wrong ordering could race with
in-flight requests.
- An out-of-tree block driver reading the bitmap during the drain interval
would see the new ordering; no in-tree caller does.
- Noble (qemu 8.2.2) needs a manual backport because surrounding code
shifted, so divergence from upstream is the risk; it is re-verified with the
same deterministic Test Plan above.
[Other Info]
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2156307
Title:
drive-mirror/blockdev-mirror/active blockcommit silently lose guest
writes during job startup
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2156307/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs