Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations

Paolo Bonzini Mon, 12 Mar 2012 05:28:57 -0700

Il 10/03/2012 19:02, Richard Laager ha scritto:
> I propose adding the following behaviors in any event:
>       * If a QEMU block device reports a discard_granularity > 0, it
>         must be equal to 2^n (n >= 0), or QEMU's block core will change
>         it to 0. (Non-power-of-two granularities are not likely to exist
>         in the real world, and this assumption greatly simplifies
>         ensuring correctness.)


Yeah, I was considering this to be simply a bug in the block device.

>       * For SCSI, report an unmap_granularity to the guest as follows:
>       max(logical_block_size, discard_granularity) / logical_block_size

This is more or less already in place later in the series.

> As a design concept, instead of guaranteeing that 512B zero'ing discards
> are supported, I think the QEMU block layer should instead guarantee
> aligned discards to QEMU block devices, emulating any misaligned
> discards (or portions thereof) by writing zeroes if (and only if)
> discard_zeros_data is set.

Yes, this can be done of course.  This series does not include it yet.

> This leaves one remaining issue: In raw-posix.c, for files (i.e. not
> devices), I assume you're going to advertise discard_granularity=1 and
> discard_zeros_data=1 when compiled with support for
> fallocate(FALLOC_FL_PUNCH_HOLE). Note, I'm assuming fallocate() actually
> guarantees that it zeros the data when punching holes.

It does, that's pretty much the definition of a hole.

> If the guest does a big discard (think mkfs) and fallocate() returns
> EOPNOTSUPP, you'll have to zero essentially the whole virtual disk,
> which, as you noted, will also allocate it (unless you explicitly check
> for holes). This is bad. It can be avoided by not advertising
> discard_zeros_data, but as you noted, that's unfortunate.

If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
be done by skipping the zero write on known holes.

This could even be done at the block layer level using bdrv_is_allocated.

> If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid
> advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not
> going to work. This would side step these problems. 

... and introduce others when migrating if your datacenter doesn't have
homogeneous kernel versions and/or filesystems. :(

> You said it wasn't
> possible to probe for FALLOC_FL_PUNCH_HOLE. Have you considered probing
> by extending the file by one byte and then punching that:
>         char buf = 0;
>         fstat(s->fd, &st);
>         pwrite(s->fd, &buf, 1, st.st_size + 1);
>         has_discard = !fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | 
> FALLOC_FL_KEEP_SIZE,
>                                  st.st_size + 1, 1);
>         ftruncate(s->fd, st.st_size);

Nice trick. :)   Yes, that could work.

Do you know if non-Linux operating systems have something similar to
BLKDISCARDZEROES?

Paolo

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations

Reply via email to