Based-on: <[email protected]> (“mirror: Do not dereference invalid pointers”)
Hi, We have this function bdrv_is_first_non_filter(), and I don’t know what we have it for exactly. It’s used in three places: 1. To allow snapshots only on such nodes, 2. To allow resizing only on such nodes, 3. For check_to_replace_node(). In none of these places it’s clear why we’d want it. For (1), snapshots should just work for every node that supports backing nodes. We have such checks in place, so we don’t need to call bdrv_is_first_non_filter(); and it should be fine to snapshot nodes somewhere down a backing chain, too. For (2), bdrv_truncate() should work on any node. There is no reason why we’d prevent the user from invoking block_resize only on the first non-filter in a chain. We now have the RESIZE permission, and as long as that can be taken, any node can be resized. For (3), it is just wrong. The main reason it was introduced here was to replace broken Quorum children. But Quorum isn’t actually a filter, and that is evidenced precisely here: The user wants to replace a child that *does not* match the overall quorum. We need something else entirely here, namely a special bdrv_recurse_can_replace() function. That the current approach doesn’t actually work can be seen by attaching some other parent to a Quorum child, and then trying to replace that child; Quorum will happily agree, but nobody asked that other parent what they think of suddenly changing the data on their child. It isn’t wrong to let a node be replaced when there are only filters between it and the source node (because then they must have the same data). What’s wrong is that Quorum calls itself a filter. In the new .bdrv_recurse_can_replace(), Quorum can be more aware of all of this. So it verifies that there is noone else who might care about sudden data change by unsharing CONSISTENT_READ and taking the WRITE permission. The second problem is that mirror never checked whether the combination of mirror command (drive/blockdev), sync mode, and @replaces asks for a loop. While bdrv_replace_node() was fixed in commit ec9f10fe064f2abb9dc60a9fa580d8d0933f2acf to never create loops, mirror should still reject such a configuration because it will probably not achieve what the user expects. Other things this series does: - Fix Quorum’s permissions: It cannot share WRITE or RESIZE permissions because that would break the Quorum - Drop .is_filter = true from Quorum - Add some tests (In case you’re wondering, yes, this 22-patch series does replace a single patch from my 42-patch series “Deal with filters”.) Max Reitz (22): blockdev: Allow external snapshots everywhere blockdev: Allow resizing everywhere block: Drop bdrv_is_first_non_filter() iotests: Let 041 use -blockdev for quorum children quorum: Fix child permissions block: Add bdrv_recurse_can_replace() blkverify: Implement .bdrv_recurse_can_replace() quorum: Store children in own structure quorum: Add QuorumChild.to_be_replaced quorum: Implement .bdrv_recurse_can_replace() block: Use bdrv_recurse_can_replace() block: Remove bdrv_recurse_is_first_non_filter() mirror: Double-check immediately before replacing quorum: Stop marking it as a filter mirror: Prevent loops iotests: Use complete_and_wait() in 155 iotests: Add VM.assert_block_path() iotests: Resolve TODOs in 041 iotests: Use self.image_len in TestRepairQuorum iotests: Add tests for invalid Quorum @replaces iotests: Check that @replaces can replace filters iotests: Mirror must not attempt to create loops include/block/block.h | 5 - include/block/block_int.h | 19 ++- block.c | 115 +++++++++------ block/blkverify.c | 20 +-- block/copy-on-read.c | 9 -- block/mirror.c | 31 +++- block/quorum.c | 155 ++++++++++++++++---- block/replication.c | 7 - block/throttle.c | 8 - blockdev.c | 58 ++++++-- tests/qemu-iotests/041 | 268 +++++++++++++++++++++++++++++++++- tests/qemu-iotests/041.out | 4 +- tests/qemu-iotests/155 | 7 +- tests/qemu-iotests/iotests.py | 48 ++++++ 14 files changed, 602 insertions(+), 152 deletions(-) -- 2.21.0
