Am 14.05.25 um 19:26 schrieb Kevin Wolf: > Am 08.05.2025 um 16:09 hat Fiona Ebner geschrieben: >> Signed-off-by: Fiona Ebner <f.eb...@proxmox.com> >> --- >> >> Could the bs associated to the device change because of polling >> when draining? If yes, does that mean we need to drain all in the >> beginning and not temporarily unlock? > > I'm starting to hate this pattern. :-) > > Maybe it would eventually be a good idea to have some helpers to deal > with this common problem of "I want to drain all nodes that will be > affected by the operation, but draining could change which nodes are > part of it".
Yes, that would be nice :) > For this specific case, though, I think drain all is right simply > because we may already rely on it: I suppose qmp_transaction() calls > bdrv_drain_all() for a reason, but it should actually be a drain > section. I'm pretty sure we're not just waiting for some requests to > complete, but want to keep other users away. Maybe the graph lock > actually covers whatever this drain was initially supposed to solve, but > bdrv_drain_all() doesn't look right. > > So how about making it bdrv_drain_all_begin/end() in qmp_transaction() > just because that's the safe thing to do, and then we can just drop any > individual drains in action implementations? Unfortunately, this does not work on its own. Among others, iotest 096 for qcow2 will hang, because bdrv_img_create() is called by external_snapshot_action() while drained and then qcow2_co_create() coroutine will hang in blk_co_do_pwritev_part()/blk_wait_while_drained(). So it seems having the more targeted drain here is important. Would it be sensible to go with the "temporarily unlock" solution, but adding an extra check that the root bs of the device is still the one we drained after re-locking? And returning an error if not. At least for now, because the series is already getting quite big (in particular with the split of 09/11). Best Regards, Fiona