On Wed, Feb 28, 2018 at 09:11:32PM +0100, Max Reitz wrote: > On 2018-02-28 19:08, Max Reitz wrote: > > On 2018-02-27 17:17, Stefan Hajnoczi wrote: > >> On Mon, Feb 26, 2018 at 06:03:13PM +0100, Max Reitz wrote: > >>> There are filesystems (among which is tmpfs) that have a hard time > >>> reporting allocation status. That is definitely a bug in them. > >>> > >>> However, there is no good reason why qemu-img convert should query the > >>> allocation status in the first place. It does zero detection by itself > >>> anyway, so we can detect unallocated areas ourselves. > >>> > >>> Furthermore, if a filesystem driver has any sense, reading unallocated > >>> data should take just as much time as lseek(SEEK_DATA) + memset(). So > >>> the only overhead we introduce by dropping the manual lseek() call is a > >>> memset() in the driver and a buffer_is_zero() in qemu-img, both of which > >>> should be relatively quick. > >> > >> This makes sense. Which file systems did you test this patch on? > > > > On tmpfs and xfs, so far. > > > >> XFS, ext4, and tmpfs would be a good minimal test set to prove the > >> patch. Perhaps with two input files: > >> 1. A file that is mostly filled with data. > >> 2. A file that is only sparsely populated with data. > > > > And probably with vmdk, which (by default) forbids querying any areas > > larger than 64 kB. > > > >> The time taken should be comparable with the time before this patch. > > > > Yep, I'll do some benchmarks. > > And the results are in. I've created 2 GB images on various filesystems > in various formats, then I've either written 64 kB every 32 MB to them > ("sparse"), or left out 64 kB every 32 MB ("full"). Then I've converted > them to null-co:// and took the (real) time through "time". (Script is > attached.) > > I've attached the raw results before and after this patch. Usually, I > did six runs for each case and dropped the most extreme outlier -- > except for full vmdk images, where I've only done one run for each case > because creating these images can take a very long time. > > Here are the differences from before to after: > > sparse raw on tmpfs: + 19 % (436 ms to 520 ms) > sparse qcow2 on tmpfs: - 31 % (435 ms to 301 ms) > sparse vmdk on tmpfs: + 37 % (214 ms to 294 ms) > > sparse raw on xfs: + 69 % (452 ms to 762 ms) > sparse qcow2 on xfs: - 34 % (462 ms to 304 ms) > sparse vmdk on xfs: + 42 % (210 ms to 298 ms) > > sparse raw on ext4: +360 % (144 ms to 655 ms) > sparse qcow2 on ext4: +120 % (147 ms to 330 ms) > sparse vmdk on ext4: + 16 % (253 ms to 293 ms) > > > full raw on tmpfs: - 9 % (437 ms to 398 ms) > full qcow2 on tmpfs: - 75 % (1.63 s to 403 ms) > full vmdk on tmpfs: -100 % (10 min to 767 ms) > > full raw on xfs: - 1 % (407 ms to 404 ms, insignificant) > full qcow2 on xfs: - 1 % (410 ms to 404 ms, insignificant) > full vmdk on xfs: - 33 % (1.05 s to 695 ms) > > > > > full raw on ext4: - 2 % (308 ms to 301 ms, insignificant) > full qcow2 on ext4: + 2 % (307 ms to 312 ms, insignificant) > full vmdk on ext4: - 74 % (3.53 s to 839 ms) > > > So... It's more extreme than I had hoped, that's for sure. What I > conclude from this is: > > (1) This patch is generally good for nearly fully allocated images. In > the worst case (on well-behaving filesystems with well-optimized image > formats) it changes nothing. In the best case, conversion time is > reduced drastically. > > (2) For sparse raw images, this is absolutely devastating. Reading them > now takes more than (ext4) or nearly (xfs) twice as much time as reading > a fully allocated image. So much for "if a filesystem driver has any > sense". > > (2a) It might be worth noting that on xfs, reading the sparse file took > longer even before this patch... > > (3) qcow2 is different: It benefits from this patch on tmpfs and xfs > (note that reading a sparse qcow2 file took longer than reading a full > qcow2 file before this patch!), but it gets pretty much destroyed on > ext4, too. > > (4) As for sparse vmdk images... Reading them takes longer, but it's > still fasster than reading full vmdk images, so that's not horrible. > > > So there we are. I was wrong about filesystem drivers having any sense, > so this patch can indeed have a hugely negative impact. > > I would argue that conversion time for full images is more important, > because that's probably the main use case; but the thing is that here > this patch only helps for tmpfs and vmdk. We don't care too much about > vmdk, and the fact that tmpfs takes so long simply is a bug. > > I guess the xfs/tmpfs results would still be in a range where they are > barely acceptable (because it's mainly a qcow2 vs. raw tradeoff), but > the ext4 horrors probably make this patch a no-go in its current form. > > In any case it's interesting to see that even the current qemu-img > convert takes longer to read sparsely allocated qcow2/raw files from xfs > than fully allocated images... > > > So I guess I'll re-send this patch where the change is done only for > -S 0.
Wow, unexpected. Thanks for doing the benchmarks! Stefan
Description: PGP signature