Hi, The fix (patch 1) is pretty straightforward; patch 2 (which I need for the test) may not be.
The biggest problem with patch 2 is that you can use it to uncover where our permission handling is broken. For example, devising the test case (patch 4) was very difficult because I kept running into the &error_abort that mirror_exit_common() passes when dropping the mirror_top_bs. The problem is that mirror_top_bs does not take the same permissions that its parent takes. Ergo using &error_abort when dropping it is wrong: The parent may require more permissions that mirror_top_bs did, and so dropping mirror_top_bs may fail. Now what’s really bad is that this cannot be fixed with our current permission system. mirror_top_bs was introduced precisely so it does not take CONSISTENT_READ, but can still allow parents to take it (for active commits). But what if there is actually something besides the mirror job that unshares CONSISTENT_READ? Imagine this: mirror target BB mirror source BB | | v v mirror_top_bs -> top -> mid -> base ^ | other_parent The source BB unshares CONSISTENT_READ on the base. mirror_top_bs ensures that its parents can read from top even though top itself cannot allow CONSISTENT_READ to be taken. So far so good. But what if other_parent also unshares CONSISTENT_READ? Then, mirror_top_bs has no business allowing its parents to take it. No idea how to fix that. (I suppose mirror_top_bs would need some way to verify that there is no other party that has unshared CONSISTENT_READ but its associated source BB. In the future, we want the source BB to go away and instead have the source be an immediate BdrvChild of mirror_top_bs. Maybe we can then build something into the block layer so that a node can only restore CONSISTENT_READ when it was that node that broke it?) Anyway. You can see something arising from this problem simply by unsharing CONSISTENT_READ on the target node. (Just drop the src-perm node from the test I add in patch 4.) Replacing the source with the target will then work fine (because mirror_top_bs doesn’t care about CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs – because its parent does want CONSISTENT_READ. Thus, the &error_abort aborts. While this is a more special case, I have no idea how to fix this one either. Soo... This series just fixes one thing, and leaves another unfixed because I have no idea how to fix it. Worse, it adds parameters to blkdebug to actually see the problem. Do we want to let blkdebug be able to crash qemu (because of a bug in qemu)? Max Reitz (4): mirror: Do not dereference invalid pointers blkdebug: Allow taking/unsharing permissions iotests: Add @error to wait_until_completed iotests: Add test for failing mirror complete qapi/block-core.json | 29 +++++++++- block/blkdebug.c | 106 +++++++++++++++++++++++++++++++++- block/mirror.c | 13 +++-- tests/qemu-iotests/041 | 44 ++++++++++++++ tests/qemu-iotests/041.out | 4 +- tests/qemu-iotests/iotests.py | 18 ++++-- 6 files changed, 200 insertions(+), 14 deletions(-) -- 2.21.0