19.12.2019 13:02, Kevin Wolf wrote: > Am 18.12.2019 um 11:28 hat Vladimir Sementsov-Ogievskiy geschrieben: >> Hi! >> >> Some time ago, we've faced and fixed the fact that qcow2 bitmap api doesn't >> call qcow2_co_mutex_lock, before accessing qcow2 metadata. This was solved by >> moving qcow2_co_remove_persistent_dirty_bitmap and >> qcow2_co_can_store_new_dirty_bitmap to coroutine and call >> qcow2_co_mutex_lock. >> >> Now I decided to look at big picture (it is attached). >> >> Boxes are qcow2 driver api, green border means that function calls >> qcow2_co_mutex_lock >> (it doesn't guarantee, that exactly child node call is locked, but it is >> something). >> >> In the picture there are just all functions, calling qcow2_cache_get/put.. >> Not all the >> functions, that needs locking, but again, it is something. >> >> So, accordingly to the picture, it seems that the following functions lacks >> locking: >> >> qcow2_co_create > > This should be easy to fix. It's also relatively harmless because it's > unlikely that the image that is being created is accessed by someone > else (the user would have to query the auto-generated node name and > start something on it - at which point they deserve what they get). > >> qcow2_snapshot_* >> (but it is both drained and aio context locked, so should be safe, yes?) > > If you checked that these conditions are true, it should be safe. > >> qcow2_reopen_bitmaps_rw >> qcow2_store_persistent_dirty_bitmaps > > Reopen drains the image, so I think this is safe in practice. > > If we want to do something about it anyway (e.g. move it to a coroutine > so it can take a lock) the question is where to do that. Maybe even for > .bdrv_reopen_* in general? > >> qcow2_amend_options > > Only qemu-img so far, so no concurrency. We're about to add > blockdev-amend in QMP, though, so this looks like something that should > take the lock. > > In fact, is taking the lock enough or should it actually drain the node, > too? > >> qcow2_make_empty > > This one should certainly drain. It is used not only in qemu-img, but > also in HMP commit and apparently also in replication. > > This one might be a bug that could become visible in practice. Unlikely > for HMP commit (because it takes a while and is holding the BQL, so no > new guest requests will be processed), except maybe for cases where > there is nothing to commit. > >> === >> >> Checking green nodes: >> >> qcow2_co_invalidate_cache actually calls qcow2_close unlocked, it's >> another reason to fix qcow2_store_persistent_dirty_bitmaps > > Might be. Do we want a .bdrv_co_close? > >> qcow2_write_snapshots actually called unlocked from >> qcow2_check_fix_snapshot_table.. It seems unsafe. > > This is curious, I'm not sure why you would drop the lock there. Max? > > bdrv_flush() calls would have to replaced with qcow2_write_caches() to > avoid a deadlock, but otherwise I don't see why we would want to drop > the lock. > > Of course, this should only be called from qemu-img check, so in > practice it's probably not a bug. >
Thanks for analysis! I'll continue thinking on this and come with patches (or new questions). -- Best regards, Vladimir
