** Description changed:

  [Impact]
  
  drive-mirror, blockdev-mirror, and active blockcommit can silently lose guest 
writes issued during a short window at job startup.
  The destination/base image keeps stale data and the job pivots with no error.
  It's silent data corruption.
  
  mirror_start_job() disables the block-layer dirty bitmap before the mirror 
filter's own tracking is live, so writes in that window are tracked by neither 
(block/mirror.c).
  Regression since QEMU 8.0.0, introduced by commit 32125b14606a ('mirror: Fix 
access of uninitialised fields during start').
  
  Upstream issue, with MySQL/PostgreSQL corruption reported by production users 
on the standard `virsh blockcommit --active --pivot` flow:
-   https://gitlab.com/qemu-project/qemu/-/issues/3273
+   https://gitlab.com/qemu-project/qemu/-/issues/3273
  
  Fixed upstream in 0f51f9c3420b, backported to the qemu-stable tree as
  61f14858c159.
  
  [Test Plan]
  
- Not a guest test: QEMU runs with -machine none (no VM/OS booted). 
+ Not a guest test: QEMU runs with -machine none (no VM/OS booted).
  The startup race is reproduced by injecting a controlled QMP/blkdebug/NBD 
sequence, not by guest I/O.
  
- # 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
- qemu-img create -f qcow2 base.qcow2 64G
- qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
+   # 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
+   qemu-img create -f qcow2 base.qcow2 64G
+   qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
  
- # 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init() 
scanning past offset 0
- for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done | 
qemu-io -f qcow2 top.qcow2 >/dev/null
+   # 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init() 
scanning past offset 0
+   for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done | 
qemu-io -f qcow2 top.qcow2 >/dev/null
  
- # 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read 
replies/events from qmp.out
- mkfifo qmp.in
- qemu-system-x86_64 -nodefaults -machine none -display none -monitor none -qmp 
stdio -drive 
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2 
<qmp.in >qmp.out 2>qemu.err &
- qemu_pid=$!
- exec 3>qmp.in
- echo '{"execute":"qmp_capabilities"}' >&3
+   # 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read 
replies/events from qmp.out
+   mkfifo qmp.in
+   qemu-system-x86_64 -nodefaults -machine none -display none -monitor none 
-qmp stdio -drive 
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2 
<qmp.in >qmp.out 2>qemu.err &
+   qemu_pid=$!
+   exec 3>qmp.in
+   echo '{"execute":"qmp_capabilities"}' >&3
  
- # 4. writable NBD export on the active node, BEFORE block-commit so the 
mirror filter takes over its writes
- echo 
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
 >&3
- echo 
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
 >&3
+   # 4. writable NBD export on the active node, BEFORE block-commit so the 
mirror filter takes over its writes
+   echo 
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
 >&3
+   echo 
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
 >&3
  
- # 5. arm the blkdebug breakpoint, then start the active commit
- echo '{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io 
drive0 \"break l2_load A\""}}' >&3
- # active commit of drive0's whole backing chain (no base/top passed)
- echo 
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
 >&3
+   # 5. arm the blkdebug breakpoint, then start the active commit
+   echo 
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0 
\"break l2_load A\""}}' >&3
+   # active commit of drive0's whole backing chain (no base/top passed)
+   echo 
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
 >&3
  
- # 6. wait until dirty_init hits l2_load (offset 0 already scanned); its 
wait_break reply carries id "wb"
- echo 
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
 drive0 \"wait_break A\""}}' >&3
- until grep -q '"wb"' qmp.out; do sleep 0.2; done
+   # 6. wait until dirty_init hits l2_load (offset 0 already scanned); its 
wait_break reply carries id "wb"
+   echo 
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
 drive0 \"wait_break A\""}}' >&3
+   until grep -q '"wb"' qmp.out; do sleep 0.2; done
  
- # 7. write into the still-open startup window (job not installed yet); it 
blocks until resume, so background it
- qemu-io -f raw -c "write -P 0x7b 0 64k" 
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
+   # 7. write into the still-open startup window (job not installed yet); it 
blocks until resume, so background it
+   qemu-io -f raw -c "write -P 0x7b 0 64k" 
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
  
- # 8. give the background write time to connect and block on the breakpoint, 
then resume
- sleep 1
- echo '{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io 
drive0 \"resume A\""}}' >&3
- wait $!
+   # 8. give the background write time to connect and block on the breakpoint, 
then resume
+   sleep 1
+   echo 
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0 
\"resume A\""}}' >&3
+   wait $!
  
- # 9. finish the active commit, pivot, quit
+   # 9. finish the active commit, pivot, quit
  until grep -q BLOCK_JOB_READY qmp.out; do sleep 0.2; done
- echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
+   echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
  until grep -q BLOCK_JOB_COMPLETED qmp.out; do sleep 0.2; done
- echo '{"execute":"quit"}' >&3
- wait "$qemu_pid"
+   echo '{"execute":"quit"}' >&3
+   wait "$qemu_pid"
  
- # 10. check the committed base directly
- qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
- qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
+   # 10. check the committed base directly
+   qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
+   qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
  
- The 0x11 read is a control (always passes: the commit copied normal data).
- The 0x7b read at offset 0 is the verdict:
+   The 0x11 read is a control (always passes: the commit copied normal data).
+   The 0x7b read at offset 0 is the verdict:
  
-   before patch:  fails -- base reads 0x00, the window write was lost
-   after patch:   succeeds -- the window write reached base
+   before patch:  fails -- base reads 0x00, the window write was lost
+   after patch:   succeeds -- the window write reached base
  
  [Where problems could occur]
  
  The change is in mirror_start_job() and the per-write hot path 
bdrv_mirror_top_do_write(), shared by drive-mirror, blockdev-mirror and active 
block-commit.
  A regression would therefore affect any such job (libvirt 
blockcommit/blockcopy, live storage migration), not only the startup window 
being fixed.
  - Bitmap lifecycle moved: the mirror bitmap is now created right after 
bdrv_append() and released on the job-start failure path, so a mistake there 
could leak the bitmap or free it twice.
  - Bitmap create/disable now runs inside the drained section, reordered 
against bdrv_append() and job creation; wrong ordering could race with 
in-flight requests.
  - An out-of-tree block driver reading the bitmap during the drain interval 
would see the new ordering; no in-tree caller does.
  - Noble (qemu 8.2.2) needs a manual backport because surrounding code 
shifted, so divergence from upstream is the risk; it is re-verified with the 
same deterministic Test Plan above.
  
  [Other Info]

** Description changed:

  [Impact]
  
  drive-mirror, blockdev-mirror, and active blockcommit can silently lose guest 
writes issued during a short window at job startup.
  The destination/base image keeps stale data and the job pivots with no error.
  It's silent data corruption.
  
  mirror_start_job() disables the block-layer dirty bitmap before the mirror 
filter's own tracking is live, so writes in that window are tracked by neither 
(block/mirror.c).
  Regression since QEMU 8.0.0, introduced by commit 32125b14606a ('mirror: Fix 
access of uninitialised fields during start').
  
  Upstream issue, with MySQL/PostgreSQL corruption reported by production users 
on the standard `virsh blockcommit --active --pivot` flow:
    https://gitlab.com/qemu-project/qemu/-/issues/3273
  
  Fixed upstream in 0f51f9c3420b, backported to the qemu-stable tree as
  61f14858c159.
  
  [Test Plan]
  
  Not a guest test: QEMU runs with -machine none (no VM/OS booted).
  The startup race is reproduced by injecting a controlled QMP/blkdebug/NBD 
sequence, not by guest I/O.
  
-   # 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
-   qemu-img create -f qcow2 base.qcow2 64G
-   qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
+ 1. active-commit chain: base.qcow2 <- top.qcow2 (64G sparse)
+   qemu-img create -f qcow2 base.qcow2 64G
+   qemu-img create -f qcow2 -F qcow2 -b base.qcow2 top.qcow2 64G
  
-   # 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init() 
scanning past offset 0
-   for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done | 
qemu-io -f qcow2 top.qcow2 >/dev/null
+ 2. 2046 marker clusters (0x11) from 64M up, to keep mirror_dirty_init() 
scanning past offset 0
+   for i in $(seq 0 2045); do echo "write -P 0x11 $((64 + i*32))M 64k"; done | 
qemu-io -f qcow2 top.qcow2 >/dev/null
  
-   # 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read 
replies/events from qmp.out
-   mkfifo qmp.in
-   qemu-system-x86_64 -nodefaults -machine none -display none -monitor none 
-qmp stdio -drive 
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2 
<qmp.in >qmp.out 2>qemu.err &
-   qemu_pid=$!
-   exec 3>qmp.in
-   echo '{"execute":"qmp_capabilities"}' >&3
+ 3. start QEMU with QMP on stdio: write JSON commands to the FIFO, read 
replies/events from qmp.out
+   mkfifo qmp.in
+   qemu-system-x86_64 -nodefaults -machine none -display none -monitor none 
-qmp stdio -drive 
if=none,id=drive0,node-name=top,format=qcow2,file=blkdebug::"$PWD"/top.qcow2 
<qmp.in >qmp.out 2>qemu.err &
+   qemu_pid=$!
+   exec 3>qmp.in
+   echo '{"execute":"qmp_capabilities"}' >&3
  
-   # 4. writable NBD export on the active node, BEFORE block-commit so the 
mirror filter takes over its writes
-   echo 
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
 >&3
-   echo 
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
 >&3
+ 4. writable NBD export on the active node, BEFORE block-commit so the mirror 
filter takes over its writes
+   echo 
'{"execute":"nbd-server-start","arguments":{"addr":{"type":"unix","data":{"path":"'"$PWD"'/nbd.sock"}}}}'
 >&3
+   echo 
'{"execute":"block-export-add","arguments":{"id":"exp0","type":"nbd","node-name":"top","name":"exp0","writable":true}}'
 >&3
  
-   # 5. arm the blkdebug breakpoint, then start the active commit
-   echo 
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0 
\"break l2_load A\""}}' >&3
-   # active commit of drive0's whole backing chain (no base/top passed)
-   echo 
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
 >&3
+ 5. arm the blkdebug breakpoint, then start the active commit
+   echo 
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0 
\"break l2_load A\""}}' >&3
+   # active commit of drive0's whole backing chain (no base/top passed)
+   echo 
'{"execute":"block-commit","arguments":{"device":"drive0","job-id":"commit","filter-node-name":"commit-filter"}}'
 >&3
  
-   # 6. wait until dirty_init hits l2_load (offset 0 already scanned); its 
wait_break reply carries id "wb"
-   echo 
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
 drive0 \"wait_break A\""}}' >&3
-   until grep -q '"wb"' qmp.out; do sleep 0.2; done
+ 6. wait until dirty_init hits l2_load (offset 0 already scanned); its 
wait_break reply carries id "wb"
+   echo 
'{"execute":"human-monitor-command","id":"wb","arguments":{"command-line":"qemu-io
 drive0 \"wait_break A\""}}' >&3
+   until grep -q '"wb"' qmp.out; do sleep 0.2; done
  
-   # 7. write into the still-open startup window (job not installed yet); it 
blocks until resume, so background it
-   qemu-io -f raw -c "write -P 0x7b 0 64k" 
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
+ 7. write into the still-open startup window (job not installed yet); it 
blocks until resume, so background it
+   qemu-io -f raw -c "write -P 0x7b 0 64k" 
"nbd+unix:///exp0?socket=$PWD/nbd.sock" &
  
-   # 8. give the background write time to connect and block on the breakpoint, 
then resume
-   sleep 1
-   echo 
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0 
\"resume A\""}}' >&3
-   wait $!
+ 8. give the background write time to connect and block on the breakpoint, 
then resume
+   sleep 1
+   echo 
'{"execute":"human-monitor-command","arguments":{"command-line":"qemu-io drive0 
\"resume A\""}}' >&3
+   wait $!
  
-   # 9. finish the active commit, pivot, quit
+ 9. finish the active commit, pivot, quit
  until grep -q BLOCK_JOB_READY qmp.out; do sleep 0.2; done
-   echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
+   echo '{"execute":"block-job-complete","arguments":{"device":"commit"}}' >&3
  until grep -q BLOCK_JOB_COMPLETED qmp.out; do sleep 0.2; done
-   echo '{"execute":"quit"}' >&3
-   wait "$qemu_pid"
+   echo '{"execute":"quit"}' >&3
+   wait "$qemu_pid"
  
-   # 10. check the committed base directly
-   qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
-   qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
+ 10. check the committed base directly
+   qemu-io -f qcow2 -c "read -P 0x11 64M 64k" base.qcow2
+   qemu-io -f qcow2 -c "read -P 0x7b 0 64k" base.qcow2
  
-   The 0x11 read is a control (always passes: the commit copied normal data).
-   The 0x7b read at offset 0 is the verdict:
+   The 0x11 read is a control (always passes: the commit copied normal data).
+   The 0x7b read at offset 0 is the verdict:
  
    before patch:  fails -- base reads 0x00, the window write was lost
    after patch:   succeeds -- the window write reached base
  
  [Where problems could occur]
  
  The change is in mirror_start_job() and the per-write hot path 
bdrv_mirror_top_do_write(), shared by drive-mirror, blockdev-mirror and active 
block-commit.
  A regression would therefore affect any such job (libvirt 
blockcommit/blockcopy, live storage migration), not only the startup window 
being fixed.
  - Bitmap lifecycle moved: the mirror bitmap is now created right after 
bdrv_append() and released on the job-start failure path, so a mistake there 
could leak the bitmap or free it twice.
  - Bitmap create/disable now runs inside the drained section, reordered 
against bdrv_append() and job creation; wrong ordering could race with 
in-flight requests.
  - An out-of-tree block driver reading the bitmap during the drain interval 
would see the new ordering; no in-tree caller does.
  - Noble (qemu 8.2.2) needs a manual backport because surrounding code 
shifted, so divergence from upstream is the risk; it is re-verified with the 
same deterministic Test Plan above.
  
  [Other Info]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2156307

Title:
  drive-mirror/blockdev-mirror/active blockcommit silently lose guest
  writes  during job startup

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2156307/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to