Am 14.05.25 um 19:26 schrieb Kevin Wolf:
> Am 08.05.2025 um 16:09 hat Fiona Ebner geschrieben:
>> Signed-off-by: Fiona Ebner <f.eb...@proxmox.com>
>> ---
>>
>> Could the bs associated to the device change because of polling
>> when draining? If yes, does that mean we need to drain all in the
>> beginning and not temporarily unlock?
> 
> I'm starting to hate this pattern. :-)
> 
> Maybe it would eventually be a good idea to have some helpers to deal
> with this common problem of "I want to drain all nodes that will be
> affected by the operation, but draining could change which nodes are
> part of it".

Yes, that would be nice :)

> For this specific case, though, I think drain all is right simply
> because we may already rely on it: I suppose qmp_transaction() calls
> bdrv_drain_all() for a reason, but it should actually be a drain
> section. I'm pretty sure we're not just waiting for some requests to
> complete, but want to keep other users away. Maybe the graph lock
> actually covers whatever this drain was initially supposed to solve, but
> bdrv_drain_all() doesn't look right.
>
> So how about making it bdrv_drain_all_begin/end() in qmp_transaction()
> just because that's the safe thing to do, and then we can just drop any
> individual drains in action implementations?

Unfortunately, this does not work on its own. Among others, iotest 096
for qcow2 will hang, because bdrv_img_create() is called by
external_snapshot_action() while drained and then qcow2_co_create()
coroutine will hang in
blk_co_do_pwritev_part()/blk_wait_while_drained(). So it seems having
the more targeted drain here is important.

Would it be sensible to go with the "temporarily unlock" solution, but
adding an extra check that the root bs of the device is still the one we
drained after re-locking? And returning an error if not. At least for
now, because the series is already getting quite big (in particular with
the split of 09/11).

Best Regards,
Fiona


Reply via email to