Hi! bdrv_replace_node_common() keeps old node parents in a list and call bdrv_replace_child_noperm() in a loop..
But bdrv_replace_child_noperm() may do aio_poll, which may trigger any graph change, up to freeing child which we keep in a loop. Actually I've reach something similar with a lot modified code, not sure that it may be reproduced on master. Still, here is a backtrace, to illustrate what I mean: #0 bdrv_detach_child (child=0x5555557e50e0) at ../block.c:3073 #1 0x0000555555609d53 in bdrv_root_unref_child (child=0x5555557e50e0) at ../block.c:3084 #2 0x0000555555609e57 in bdrv_unref_child (parent=0x55555582de10, child=0x5555557e50e0) at ../block.c:3124 #3 0x000055555560db2a in bdrv_close (bs=0x55555582de10) at ../block.c:4728 #4 0x000055555560e7eb in bdrv_delete (bs=0x55555582de10) at ../block.c:5056 #5 0x0000555555610ea6 in bdrv_unref (bs=0x55555582de10) at ../block.c:6409 #6 0x0000555555609d5f in bdrv_root_unref_child (child=0x5555557e00d0) at ../block.c:3085 #7 0x0000555555588122 in blk_remove_bs (blk=0x555555838df0) at ../block/block-backend.c:831 #8 0x00005555555875c0 in blk_delete (blk=0x555555838df0) at ../block/block-backend.c:447 #9 0x0000555555587864 in blk_unref (blk=0x555555838df0) at ../block/block-backend.c:502 #10 0x00005555555aeb84 in block_job_free (job=0x555555839150) at ../blockjob.c:89 #11 0x00005555555caad3 in job_unref (job=0x555555839150) at ../job.c:380 #12 0x00005555555cbc7f in job_exit (opaque=0x555555839150) at ../job.c:894 #13 0x000055555569a375 in aio_bh_call (bh=0x5555558215f0) at ../util/async.c:136 #14 0x000055555569a47f in aio_bh_poll (ctx=0x555555810e90) at ../util/async.c:164 #15 0x00005555556ac65d in aio_poll (ctx=0x555555810e90, blocking=true) at ../util/aio-posix.c:659 #16 0x0000555555639c2b in bdrv_unapply_subtree_drain (child=0x5555557ef080, old_parent=0x555555815050) at ../block/io.c:530 #17 0x00005555556062e1 in bdrv_child_cb_detach (child=0x5555557ef080) at ../block.c:1326 #18 0x000055555560918e in bdrv_replace_child_noperm (child=0x5555557ef080, new_bs=0x0) at ../block.c:2779 #19 0x0000555555607f11 in bdrv_replace_child_safe (child=0x5555557ef080, new_bs=0x0, tran=0x7fffffffda08) at ../block.c:2189 #20 0x000055555560dfce in bdrv_remove_backing (bs=0x555555815050, tran=0x7fffffffda08) at ../block.c:4884 #21 0x000055555560e3fc in bdrv_replace_node_common (from=0x555555815050, to=0x55555581c1e0, auto_skip=false, detach_subchain=true, errp=0x7fffffffda80) at ../block.c:4972 #22 0x000055555560ee57 in bdrv_drop_intermediate (top=0x555555815050, base=0x55555581c1e0, backing_file_str=0x55555581c211 "json:{\"driver\": \"test\"}") at ../block.c:5318 #23 0x0000555555583939 in test_drop_intermediate_poll () at ../tests/test-bdrv-drain.c:1822 Here a child is detached which is kept in a updated_children list in bdrv_drop_intermediate(). And we'll crash soon with use-after-free. I'll try to find a similar reproduce on master, but anyway, it seems to be a wrong design to loop through children with possible intermediate aio_poll.. This problem breaks now my work on trying to move child-replacement to prepare phase of transactional graph updates (to correctly update permissions on new graph). In short, do several updates with bdrv_replace_child_noperm(), than do permission update. If permission update fails, rollback bdrv_replace_child_noperm() calls in reverse order. But what to do with unexpected aio_poll? Seems we need a way to do several child replacement operations not triggering any drain-poll and do all needed drain-poll things at the end.. But how to achieve it I have no idea. Any thoughts? -- Best regards, Vladimir