Perhaps others have already known this, but I just realized that if you mix internal and external snapshots, you can set yourself up for massive failures when trying to use block-stream or block-commit to consolidate data across the external backing chain, without also thinking about the internal snapshots.
Here's a quick demonstration: $ # create the backing file, with all 1's; 1M clusters for convenience $ qemu-img create -f qcow2 -o cluster_size=1m base.qcow2 4M Formatting 'base.qcow2', fmt=qcow2 size=4194304 cluster_size=1048576 lazy_refcounts=off refcount_bits=16 $ qemu-io -c 'w -P 1 0 4m' -f qcow2 base.qcow2 wrote 4194304/4194304 bytes at offset 0 4 MiB, 1 ops; 0.0050 sec (791.139 MiB/sec and 197.7848 ops/sec) $ # create the wrapper file, write 2 to the first 2 clusters $ qemu-img create -f qcow2 -o backing_file=base.qcow2,backing_fmt=qcow2 top.qcow2 Formatting 'top.qcow2', fmt=qcow2 size=4194304 backing_file=base.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ qemu-io -c 'w -P 2 0 2m' -f qcow2 top.qcow2 wrote 2097152/2097152 bytes at offset 0 2 MiB, 1 ops; 0.0009 sec (2.144 GiB/sec and 1097.6948 ops/sec) $ # create an internal snapshot, then write 3 to the middle 2 clusters $ qemu-img snapshot -c snap1 top.qcow2 $ qemu-io -c 'w -P 3 1m 2m' -f qcow2 top.qcow2 wrote 2097152/2097152 bytes at offset 1048576 2 MiB, 1 ops; 0.0009 sec (2.102 GiB/sec and 1076.4263 ops/sec) $ # we've mixed internal and external; let's shorten the chain now $ qemu-img info top.qcow2 image: top.qcow2 file format: qcow2 virtual size: 4.0M (4194304 bytes) disk size: 2.3M cluster_size: 65536 backing file: base.qcow2 backing file format: qcow2 Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 snap1 0 2018-04-06 16:44:54 00:00:00.000 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false $ qemu-img rebase -f qcow2 -b '' top.qcow2 $ # create second snapshot, then revert to the first $ qemu-img snapshot -c snap2 top.qcow2 $ qemu-img snapshot -a snap1 top.qcow2 $ # contents at the time we took snap1 were 2211, right? OOOPS! $ qemu-io -c 'r -P 2 0 2m' -c 'r -P 1 2m 2m' -f qcow2 top.qcow2 read 2097152/2097152 bytes at offset 0 2 MiB, 1 ops; 0.0004 sec (3.914 GiB/sec and 2004.0080 ops/sec) Pattern verification failed at offset 2097152, 2097152 bytes read 2097152/2097152 bytes at offset 2097152 2 MiB, 1 ops; 0.0000 sec (24.723 GiB/sec and 12658.2278 ops/sec) $ # the last two clusters were rewritten from 1 to 0. :( $ qemu-io -c 'r -P 0 2m 2m' -f qcow2 top.qcow2 read 2097152/2097152 bytes at offset 2097152 2 MiB, 1 ops; 0.0001 sec (13.754 GiB/sec and 7042.2535 ops/sec) $ # repair the damage, for now, and write 4 into last cluster... $ qemu-img rebase -u -f qcow2 -b base.qcow2 -F qcow2 top.qcow2 $ qemu-io -c 'w -P 4 3m 1m' -f qcow2 top.qcow2 wrote 1048576/1048576 bytes at offset 3145728 1 MiB, 1 ops; 0.0005 sec (1.713 GiB/sec and 1754.3860 ops/sec) $ # now let's try committing instead $ qemu-img commit -f qcow2 -d top.qcow2 Image committed. $ # revert back to snap2, which had contents 2331, right? OOPS! $ qemu-img snapshot -a snap2 top.qcow2 $ qemu-io -c 'r -P 2 0 1m' -c 'r -P 3 1m 1m' -c 'r -P 3 2m 1m' -c 'r -P 1 3m 1m' -f qcow2 top.qcow2 read 1048576/1048576 bytes at offset 0 1 MiB, 1 ops; 0.0002 sec (3.860 GiB/sec and 3952.5692 ops/sec) read 1048576/1048576 bytes at offset 1048576 1 MiB, 1 ops; 0.0002 sec (3.577 GiB/sec and 3663.0037 ops/sec) read 1048576/1048576 bytes at offset 2097152 1 MiB, 1 ops; 0.0002 sec (4.628 GiB/sec and 4739.3365 ops/sec) Pattern verification failed at offset 3145728, 1048576 bytes read 1048576/1048576 bytes at offset 3145728 1 MiB, 1 ops; 0.0007 sec (1.345 GiB/sec and 1377.4105 ops/sec) $ # the last cluster was rewritten from 1 to 4. :( $ qemu-io -c 'r -P 4 3m 1m' -f qcow2 top.qcow2 read 1048576/1048576 bytes at offset 3145728 1 MiB, 1 ops; 0.0011 sec (878.735 MiB/sec and 878.7346 ops/sec) The root cause to all of this is that right now, ALL internal snapshots share the same backing file information in the file header; but block-stream operations only modify the active snapshot. The actions of changing the backing file or of rewriting the clusters in the backing file don't break the active snapshot, but DO bleed through to the internal snapshots, for any cluster where the internal snapshot was relying on the backing file. Does this mean we should make it harder to perform external block operations on a qcow2 file that has internal snapshots (either refuse outright, or at least require a 'force' flag to let the user acknowledge the risk)? Similarly, should it be harder to create an internal snapshot when an image already has an external backing file, and/or should we improve the qcow2 specification of internal snapshot descriptors to record a per-snapshot backing file rather than the current approach that all snapshots share the same backing file? Whether or not we track a per-snapshot backing file, should the presence of internal snapshots be used to request op-blockers for read consistency on backing files? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Description: OpenPGP digital signature