Re: qcow2 api not secured by mutex lock

Vladimir Sementsov-Ogievskiy Thu, 19 Dec 2019 02:26:01 -0800

19.12.2019 13:02, Kevin Wolf wrote:
> Am 18.12.2019 um 11:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi!
>>
>> Some time ago, we've faced and fixed the fact that qcow2 bitmap api doesn't
>> call qcow2_co_mutex_lock, before accessing qcow2 metadata. This was solved by
>> moving qcow2_co_remove_persistent_dirty_bitmap and
>> qcow2_co_can_store_new_dirty_bitmap to coroutine and call 
>> qcow2_co_mutex_lock.
>>
>> Now I decided to look at big picture (it is attached).
>>
>> Boxes are qcow2 driver api, green border means that function calls 
>> qcow2_co_mutex_lock
>> (it doesn't guarantee, that exactly child node call is locked, but it is 
>> something).
>>
>> In the picture there are just all functions, calling qcow2_cache_get/put.. 
>> Not all the
>> functions, that needs locking, but again, it is something.
>>
>> So, accordingly to the picture, it seems that the following functions lacks 
>> locking:
>>
>> qcow2_co_create
> 
> This should be easy to fix. It's also relatively harmless because it's
> unlikely that the image that is being created is accessed by someone
> else (the user would have to query the auto-generated node name and
> start something on it - at which point they deserve what they get).
> 
>> qcow2_snapshot_*
>>     (but it is both drained and aio context locked, so should be safe, yes?)
> 
> If you checked that these conditions are true, it should be safe.
> 
>> qcow2_reopen_bitmaps_rw
>> qcow2_store_persistent_dirty_bitmaps
> 
> Reopen drains the image, so I think this is safe in practice.
> 
> If we want to do something about it anyway (e.g. move it to a coroutine
> so it can take a lock) the question is where to do that. Maybe even for
> .bdrv_reopen_* in general?
> 
>> qcow2_amend_options
> 
> Only qemu-img so far, so no concurrency. We're about to add
> blockdev-amend in QMP, though, so this looks like something that should
> take the lock.
> 
> In fact, is taking the lock enough or should it actually drain the node,
> too?
> 
>> qcow2_make_empty
> 
> This one should certainly drain. It is used not only in qemu-img, but
> also in HMP commit and apparently also in replication.
> 
> This one might be a bug that could become visible in practice. Unlikely
> for HMP commit (because it takes a while and is holding the BQL, so no
> new guest requests will be processed), except maybe for cases where
> there is nothing to commit.
> 
>> ===
>>
>> Checking green nodes:
>>
>> qcow2_co_invalidate_cache actually calls qcow2_close unlocked, it's
>> another reason to fix qcow2_store_persistent_dirty_bitmaps
> 
> Might be. Do we want a .bdrv_co_close?
> 
>> qcow2_write_snapshots actually called unlocked from
>> qcow2_check_fix_snapshot_table.. It seems unsafe.
> 
> This is curious, I'm not sure why you would drop the lock there. Max?
> 
> bdrv_flush() calls would have to replaced with qcow2_write_caches() to
> avoid a deadlock, but otherwise I don't see why we would want to drop
> the lock.
> 
> Of course, this should only be called from qemu-img check, so in
> practice it's probably not a bug.
>


Thanks for analysis! I'll continue thinking on this and come with patches
(or new questions).


-- 
Best regards,
Vladimir

Re: qcow2 api not secured by mutex lock

Reply via email to