Re: [Qemu-block] [Qemu-discuss] Converting qcow2 image to raw thin lv

Wolfgang Bumiller Mon, 13 Feb 2017 04:51:35 -0800

On Mon, Feb 13, 2017 at 11:04:30AM +0100, Kevin Wolf wrote:
> Am 12.02.2017 um 01:58 hat Nir Soffer geschrieben:
> > On Sat, Feb 11, 2017 at 12:23 AM, Nir Soffer <nir...@gmail.com> wrote:
> > > Hi all,
> > >
> > > I'm trying to convert images (mostly qcow2) to raw format on thin lv,
> > > hoping to write only the allocated blocks on the thin lv, but
> > > it seems that qemu-img cannot write sparse image on a block
> > > device.
> > >
> > > (...)
> > 
> > So it seems that qemu-img is trying to write a sparse image.
> > 
> > I tested again with empty file:
> > 
> >     truncate -s 20m empty
> > 
> > Using strace, qemu-img checks the device discard_zeroes_data:
> > 
> >     ioctl(11, BLKDISCARDZEROES, 0)          = 0
> > 
> > Then it find that the source is empty:
> > 
> >     lseek(10, 0, SEEK_DATA)                 = -1 ENXIO (No such device
> > or address)
> > 
> > Then it issues one call
> > 
> >     [pid 10041] ioctl(11, BLKZEROOUT, 0x7f6049c82ba0) = 0
> > 
> > And fsync and close the destination.
> > 
> > # grep -s "" /sys/block/dm-57/queue/discard_*
> > /sys/block/dm-57/queue/discard_granularity:65536
> > /sys/block/dm-57/queue/discard_max_bytes:17179869184
> > /sys/block/dm-57/queue/discard_zeroes_data:0
> > 
> > I wonder why discard_zeroes_data is 0, while discarding
> > blocks seems to zero them.
> > 
> > Seems that this this bug:
> > https://bugzilla.redhat.com/835622
> > 
> > thin lv does promise (by default) to zero new allocated blocks,
> > and it does returns zeros when reading unallocated data, like
> > a sparse file.
> > 
> > Since qemu does not know that the thin lv is not allocated, it cannot
> > skip empty blocks safely.
> > 
> > It would be useful if it had a flag to force sparsness when the
> > user knows that this operation is safe, or maybe we need a thin lvm
> > driver?
> 
> Yes, I think your analysis is correct, I seem to remember that I've seen
> this happen before.
> 
> The Right Thing (TM) to do, however, seems to be fixing the kernel so
> that BLKDISCARDZEROES correctly returns that discard does in fact zero
> out blocks on this device. As soon as this ioctl works correctly,
> qemu-img should just automatically do what you want.
> 
> Now if it turns out it is important to support older kernels without the
> fix, we can think about a driver-specific option for the 'file' driver
> that overrides the kernel's value. But I really want to make sure that
> we use such workarounds only in addition, not instead of doing the
> proper root cause fix in the kernel.
> 
> So can you please bring it up with the LVM people?


I'm not sure it's that easy. The discard granularity of LVM thin is not
equal to their reported block/sector sizes, but to the size of the
chunks they allocate.

  # blockdev --getss /dev/dm-9
  512
  # blockdev --getbsz /dev/dm-9
  4096
  # blockdev --getpbsz /dev/dm-9
  4096
  # cat /sys/block/dm-9/queue/discard_granularity
  131072
  #

I currently don't see qemu using the discard_granularity property for
this purpose. IIRC the code for write_zeroes() eg. simply checks the
discard_zeroes flag but not what size it is trying to zero-out/discard.

We have an experimental semi-complete "can-do-footshooting" 'zeroinit'
filter for this purpose to basically explicitly set the "has_zero_init"
flag and drop "write_zeroes()" calls to blocks at an address greater
than the highest written one up to that point.
It should use a dirty bitmap instead and is sort of dangerous this way
which is why it's not on the qemu-devel list. But if this approach is at
all acceptable (despite being a hack) I could improve it and send it to
the list?
https://github.com/Blub/qemu/commit/6f6f38d2ef8f22a12f72e4d60f8a1fa978ac569a
(you'd just prefix the destination with `zeroinit:` in the qemu-img
command)

Additionally I'm currently still playing with the details and quirks of
various storages (lvm/dm thin, rbd, zvols) in an attempt to create a
tool to convert between various storages. (I did some successful tests
converting disk images between these storages & qcow2 together with
their snapshots in a COW-aware way...) I'm planning on releasing some
experimental code soon-ish (there's still some polishing to do though to
the documentation, the library's API and the format - and the qcow2
support is a patch for qemu-img to use the library.)

My adventures into dm-thin metadata allows me to answer this one though:

> > or maybe we need a thin lvm driver?

Probably not. It does not support SEEK_DATA/SEEK_HOLE and to my
knowledge also has no other sane metadata querying methods. You'd have
to read the metadata device instead. To do this properly you have to
reserve a metadata snapshot and there can only ever be one of those per
pool, which means you could only have 1 such disk in total running on a
system and no other dm-thin metadata aware tool could be used during
that time (otherwise the reserver operations will fail with an error and
qemu would have to wait&retry a lot...).

Re: [Qemu-block] [Qemu-discuss] Converting qcow2 image to raw thin lv

Reply via email to