qcow2: use seqcache for compressed writes

Max Reitz Mon, 15 Mar 2021 03:00:43 -0700

On 12.03.21 19:43, Vladimir Sementsov-Ogievskiy wrote:

12.03.2021 21:15, Max Reitz wrote:

On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:

Compressed writes are unaligned to 512, which works very slow in
O_DIRECT mode. Let's use the cache.


Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
  block/coroutines.h     |   3 +
  block/qcow2.h          |   4 ++
  block/qcow2-refcount.c |  10 +++
  block/qcow2.c          | 158 ++++++++++++++++++++++++++++++++++++++---
  4 files changed, 164 insertions(+), 11 deletions(-)


[...]

@@ -2699,6 +2796,12 @@ static void qcow2_close(BlockDriverState *bs)
          qcow2_inactivate(bs);
      }
+    /*

+ * Cache should be flushed in qcow2_inactivate() and should beempty in

+     * inactive mode. So we are safe to free it.
+     */
+    seqcache_free(s->compressed_cache);
+
      cache_clean_timer_del(bs);
      qcow2_cache_destroy(s->l2_table_cache);
      qcow2_cache_destroy(s->refcount_block_cache);

@@ -4558,18 +4661,42 @@qcow2_co_pwritev_compressed_task(BlockDriverState *bs,

          goto fail;
      }
-    qcow2_inflight_writes_inc(bs, cluster_offset, out_len);
+    if (s->compressed_cache) {


Why is this conditional?


We don't have compressed_cache for non o_direct.


Oh right.

+        /*
+ * It's important to do seqcache_write() in the samecritical section+ * (by s->lock) as qcow2_alloc_compressed_cluster_offset(),so that the
+         * cache is filled sequentially.
+         */
Yes.
+ seqcache_write(s->compressed_cache, cluster_offset, out_len,out_buf);
-    qemu_co_mutex_unlock(&s->lock);
+        qemu_co_mutex_unlock(&s->lock);
-    BLKDBG_EVENT(s->data_file, BLKDBG_WRITE_COMPRESSED);
- ret = bdrv_co_pwrite(s->data_file, cluster_offset, out_len,out_buf, 0);
+        ret = qcow2_co_compressed_flush_one(bs, false);
The qcow2 doc says a compressed cluster can span multiple hostclusters. I don’t know whether that can happen with this driver, butif it does, wouldn’t that mean we’d need to flush two clusters here?Oh, no, never mind. Only the first one would be finished and thusflushed, not the second one.
I could have now removed the above paragraph, but it made me think, soI kept it:
Hm. Actually, if we unconditionally flush here, doesn’t that meanthat we’ll never have a finished cluster in the cache for longer thanthe span between the seqcache_write() and thisqcow2_co_compressed_flush_one()? I.e., theqcow2_co_flush_compressed_cache() is supposed to never flush anyfinished cluster, but only the currently active unfinished cluster (ifthere is one), right?
Hmm. Maybe if we have parallel write and flush requests, it's a kind ofrace condition: may be flush will flush both finished and unfinishedcluster, maybe write will flush the finished cluster and flush willflush only unfinished one.. Moreover we may have several parallelrequests, so they make several finished clusters, and sudden flush willflush them all.

OK. I was mostly asking because I was wondering how much you expect thecache to be filled, i.e., how much you expect the read cache to help.


[...]

@@ -4681,10 +4808,19 @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
      out_buf = qemu_blockalign(bs, s->cluster_size);
-    BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
-    ret = bdrv_co_pread(bs->file, coffset, csize, buf, 0);
-    if (ret < 0) {
-        goto fail;
+    /*
+ * seqcache_read may return less bytes than csize, as csize mayexceed+ * actual compressed data size. So we are OK if seqcache_readreturns
+     * something > 0.
I was about to ask what happens when a compressed cluster spans twohost clusters (I could have imagined that in theory the second onecould have been discarded, but not the first one, so reading from thecache would really be short -- we would have needed to check that weonly fell short in the range of 512 bytes, not more).
But then I realized that in this version of the series, all finishedclusters are immediately discarded and only the current unfinished oneis kept. Does it even make sense to try seqcache_read() here, then?
Hmm. Not immediately, but after flush. An flush is not under mutex. Soin theory at some moment we may have several finished clusters"in-flight". And your question make sense. The cache supports readingfrom consequitive clusters. But we also should support here reading onepart of data from disk and another from the cache to be safe.

The question is whether it really makes sense to even have aseqcache_read() path when in reality it’s probably never accessed. Imean, besides the fact that it seems based purely on chance whether aread might fetch something from the cache even while we’re writing, inpractice I don’t know any case where we’d write to and read from acompressed qcow2 image at the same time. (I don’t know what you’redoing with the 'compress' filter, though.)

Max

Re: [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes

Reply via email to