2011/9/14 Kevin Wolf <kw...@redhat.com>: > Am 13.09.2011 15:36, schrieb Frediano Ziglio: >> 2011/9/13 Kevin Wolf <kw...@redhat.com>: >>> Am 13.09.2011 09:53, schrieb Frediano Ziglio: >>>> These patches try to trade-off between leaks and speed for clusters >>>> refcounts. >>>> >>>> Refcount increments (REF+ or refp) are handled in a different way from >>>> decrements (REF- or refm). The reason it that posting or not flushing >>>> a REF- cause "just" a leak while posting a REF+ cause a corruption. >>>> >>>> To optimize REF- I just used an array to store offsets then when a >>>> flush is requested or array reach a limit (currently 1022) the array >>>> is sorted and written to disk. I use an array with offset instead of >>>> ranges to support compression (an offset could appear multiple times >>>> in the array). >>>> I consider this patch quite ready. >>> >>> Ok, first of all let's clarify what this optimises. I don't think it >>> changes anything at all for the writeback cache modes, because these >>> already do most operations in memory only. So this must be about >>> optimising some operations with cache=writethrough. REF- isn't about >>> normal cluster allocation, it is about COW with internal snapshots or >>> bdrv_discard. Do you have benchmarks for any of them? >>> >>> I strongly disagree with your approach for REF-. We already have a >>> cache, and introducing a second one sounds like a bad idea. I think we >>> could get a very similar effect if we introduced a >>> qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as >>> dirty, but at the same time tells the cache that even in write-through >>> mode it can still treat this block as write-back. This should require >>> much less code changes. >>> >> >> Yes, mainly optimize for writethrough. I did not test with writeback >> but should improve even this (I think here you have some flush to keep >> consistency). >> I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it. > > Great, thanks! >
Don't expect however the patch too soon, I'm quite busy in these days. >>> But let's measure the effects first, I suspect that for cluster >>> allocation it doesn't help much because every REF- comes with a REF+. >>> >> >> That's 50% of effort if REF- clusters are far from REF+ :) > > I would expect that the next REF+ allocates exactly the REF- cluster. > But you still have a point, we save the write on REF- and combine it > with the REF+ write. > This is still a TODO for REF+ patch. Oh... time ago looking at refcount code I realize that a single deallocation could be reused in some cases only after Qemu restart. For instance - got a single cluster REF- which take refcount to 0 - free_cluster_index get decreased to this index - we get a new cluster request for 2 clusters - free_cluster_index get increased we skip freed deallocation and if we don't get a new deallocation for a cluster with index minor to our freed cluster this cluster is not reused. (I didn't test this behavior, no leak, no corruption, just image could be larger then expected) >>>> To optimize REF+ I mark a range as allocated and use this range to >>>> get new ones (avoiding writing refcount to disk). When a flush is >>>> requested or in some situations (like snapshot) this cache is disabled >>>> and flushed (written as REF-). >>>> I do not consider this patch ready, it works and pass all io-tests >>>> but for instance I would avoid allocating new clusters for refcount >>>> during preallocation. >>> >>> The only question here is if improving cache=writethrough cluster >>> allocation performance is worth the additional complexity in the already >>> complex refcounting code. >>> >> >> I didn't see this optimization as a second level cache, but yes, for >> REF- is a second cache. >> >>> The alternative that was discussed before is the dirty bit approach that >>> is used in QED and would allow us to use writeback for all refcount >>> blocks, regardless of REF- or REF+. It would be an easier approach >>> requiring less code changes, but it comes with the cost of requiring an >>> fsck after a qemu crash. >>> >> >> I was thinking about changing the header magic first time we change >> refcount in order to mark image as dirty so newer Qemu recognize the >> flag while former one does not recognize image. Obviously reverting >> magic on image close. > > We've discussed this idea before and I think it wasn't considered a > great idea to automagically change the header in an incompatible way. > But we can always say that for improved performance you need to upgrade > your image to qcow2 v3. > I don't understand why there is not a wiki page for detailed qcow3 changes. I saw your post on May. I follow this ML since August so I think I missed a lot of discussion on qcow improves. >>>> End speed up is quite visible allocating clusters (more then 20%). >>> >>> What benchmark do you use for testing this? >>> >>> Kevin >>> >> >> Currently I'm using bonnie++ but I noted similar improves with iozone. >> The test script format an image then launch a Linux machine which run >> a script and save result to a file. >> The test image is seems by this virtual machine as a separate disk. >> The file on hist reside in a separate LV. >> I got quite consistent results (of course not working on the machine >> while testing, is not actually dedicated to this job). >> >> Actually I'm running the test (added a test working in a snapshot image). > > Okay. Let me guess the remaining variables: The image is on an ext4 host > filesystem, you use cache=writethrough and virtio-blk. You don't use > backing files, compression and encryption. For your tests with internal > snapshots you have exactly one internal snapshot that is taken > immediately before the benchmark. Oh, and not to forget, KVM is enabled. > > Are these assumptions correct? > change ext4 and put xfs and assumptions are ok. Yes I use internal snapshots (REF- are useful only in this case). To produce "qcow2s" I use these commands $QEMU_IMG create -f qcow2 -o preallocation=metadata $FILE 15g $QEMU_IMG snapshot -c test $FILE > Kevin > I run again tests (yesterdays test was not that consistence, today I stopped all unneeded services and avoid the X session tunneled by ssh). without patched run raw qcow2 qcow2s 1 22758 4206 1054 2 28401 21745 17997 with patches (ref-,ref+) run raw qcow2 qcow2s 1 22563 4965 4382 2 23823 21043 20904 beside I still don't understand difference in second runs for raw (about 20%!!!) this confirms the huge improves with snapshot. Frediano