08.11.2018 13:33, Kevin Wolf wrote: > Am 08.11.2018 um 11:02 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 07.11.2018 21:16, Kevin Wolf wrote: >>> (Broken quoting in text/plain again) >>> >>> Am 01.11.2018 um 13:17 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>> 27.09.2018 20:35, Max Reitz wrote: >>>> >>>> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: >>>> >>>> Memory allocation may become less efficient for encrypted case. >>>> It's a >>>> payment for further asynchronous scheme. >>>> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>>> <[email protected]> >>>> --- >>>> block/qcow2.c | 114 >>>> ++++++++++++++++++++++++++++++++-------------------------- >>>> 1 file changed, 64 insertions(+), 50 deletions(-) >>>> >>>> diff --git a/block/qcow2.c b/block/qcow2.c >>>> index 65e3ca00e2..5e7f2ee318 100644 >>>> --- a/block/qcow2.c >>>> +++ b/block/qcow2.c >>>> @@ -1808,6 +1808,67 @@ out: >>>> return ret; >>>> } >>>> >>>> +static coroutine_fn int qcow2_co_preadv_normal(BlockDriverState >>>> *bs, >>>> + uint64_t >>>> file_cluster_offset, >>>> + uint64_t offset, >>>> + uint64_t bytes, >>>> + QEMUIOVector >>>> *qiov, >>>> + uint64_t >>>> qiov_offset) >>>> +{ >>>> + int ret; >>>> + BDRVQcow2State *s = bs->opaque; >>>> + void *crypt_buf = NULL; >>>> + QEMUIOVector hd_qiov; >>>> + int offset_in_cluster = offset_into_cluster(s, offset); >>>> + >>>> + if ((file_cluster_offset & 511) != 0) { >>>> + return -EIO; >>>> + } >>>> + >>>> + qemu_iovec_init(&hd_qiov, qiov->niov); >>>> >>>> So you're not just re-allocating the bounce buffer for every single >>>> call, but also this I/O vector. Hm. >>> And this one is actually at least a little more serious, I think. >>> >>> Decryption and decompression (including copying between the original >>> qiov and the bounce buffer) are already very slow, so allocating a >>> bounce buffer or not shouldn't make any noticable difference. >>> >>> But I'd like to keep allocations out of the fast path as much as we can. >>> >>>> + if (bs->encrypted) { >>>> + assert(s->crypto); >>>> + assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * >>>> s->cluster_size); >>>> + >>>> + crypt_buf = qemu_try_blockalign(bs->file->bs, bytes); >>>> >>>> 1. Why did you remove the comment? >>>> >>>> 2. The check for whether crypt_buf was successfully allocated is >>>> missing. >>>> >>>> 3. Do you have any benchmarks for encrypted images? Having to >>>> allocate >>>> the bounce buffer for potentially every single cluster sounds rather >>>> bad >>>> to me. >>> Actually, benchmarks for normal, fully allocated images would be a >>> little more interesting. Though I'm not sure if qcow2 actually performs >>> well enough that we'll see any difference even there (when we measured >>> memory allocation overhead etc. for raw images in the context of >>> comparing coroutines to AIO callback styes, the difference was already >>> hard to see). >> >> Ok, I'll benchmark it and/or improve fast path (you mean code path, >> where we skip coroutines creation, to handle the whole request at once, >> yes?). > > I had less the specific code path in mind, but the case of reading > unallocated clusters or reading/writing an already allocated area of > normal or zero clusters (no compression, no encrpytion, no other fancy > stuff). These are the cases that qcow2 images will hit the most in > practice.
There are already some benchmarks in the thread started from the cover letter of the series. > >> Hmm, all this staff with hd_qiov's in many places is due to we don't >> have qiov_offset parameter in .bdrv_co_preadv.. Is it a bad idea to >> add it? > > I actually thought the same yesterday, though I'm not completely sure > yet about it. It makes the interfaces a bit uglier, but it could indeed > save us some work in the I/O path, so unless we find bigger reasons > against it, maybe we should do that. > >> Anyway, I now think, it's better to start with decompress-threads >> series, as compress-threads are already merged, then bring encryption to >> threads too, and then return to these series. This way will remove some >> questions about allocation, and may be it would be simpler and more >> consistent to bring more things to coroutines, not only normal clusters. > > Okay, we can do that. > > Kevin > -- Best regards, Vladimir
