Re: [Qemu-block] [Libguestfs] v2v: -o rhv-upload: Long time spent zeroing the disk

Eric Blake Tue, 10 Apr 2018 08:01:04 -0700

On 04/10/2018 09:40 AM, Richard W.M. Jones wrote:
>> When the destination is a block device we cannot avoid zeroing since a block
>> device may contain junk data (we usually get dirty empty images from our
>> local
>> xtremio server).
> 
> (Off topic for qemu-block but ...)  We don't have enough information
> at our end to know about any of this.


Yep, see my other email about a possible NBD protocol extension to
actually let the client learn up-front if the exported device is known
to start in an all-zero state.

> 
>>> The problem is that the NBD block driver has max_pwrite_zeroes = 32 MB,
>>> so it's not that efficient after all. I'm not sure if there is a real
>>> reason for this, but Eric should know.
>>>
>>
>> We support zero with unlimited size without sending any payload to oVirt,
>> so
>> there is no reason to limit zero request by max_pwrite_zeros. This limit may
>> make sense when zero is emulated using pwrite.
> 
> Yes, this seems wrong, but I'd want Eric to comment.

The 32M cap is currently the fault of qemu-img, not nbdkit (nbdkit is
not further reducing the size of the zero requests it passes on to
oVirt); and I explained in the other email about how qemu 2.13 will fix
things to send larger zero requests (hmm, that means nbdkit really needs
to start supporting NBD_OPT_GO, as that is what qemu will be relying on
to learn the larger limits).

> 
>>>> However, since you suggest that we could use "trim" request for these
>>>> requests, it means that these requests are advisory (since trim is), and
>>>> we can just ignore them if the server does not support trim.
>>>
>>> What qemu-img sends shouldn't be a NBD_CMD_TRIM request (which is indeed
>>> advisory), but a NBD_CMD_WRITE_ZEROES request. qemu-img relies on the
>>> image actually being zeroed after this.
>>>
>>
>> So it seems that may_trim=1 is wrong, since trim cannot replace zero.
> 
> Note that the current plugin ignores may_trim.  It is not used at all,
> so it's not relevant to this problem.
> 
> However this flag actually corresponds to the inverse of
> NBD_CMD_FLAG_NO_HOLE which is defined by the NBD spec as:
> 
>     bit 1, NBD_CMD_FLAG_NO_HOLE; valid during
>     NBD_CMD_WRITE_ZEROES. SHOULD be set to 1 if the client wants to
>     ensure that the server does not create a hole. The client MAY send
>     NBD_CMD_FLAG_NO_HOLE even if NBD_FLAG_SEND_TRIM was not set in the
>     transmission flags field. The server MUST support the use of this
>     flag if it advertises NBD_FLAG_SEND_WRITE_ZEROES. *
> 
> qemu-img convert uses NBD_CMD_WRITE_ZEROES and does NOT set this flag
> (hence in the plugin we see may_trim=1), and I believe that qemu-img
> is correct because it doesn't want to force preallocation.

Yes, the flag usage is correct, and you are also correct that the
'may_trim' flag of nbdkit is the inverse bit sense of the
NBD_CMD_FLAG_NO_HOLE of the NBD protocol; it's all a documentation game
in deciding whether having a bit be 0 or 1 in the default state made
more sense.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Libguestfs] v2v: -o rhv-upload: Long time spent zeroing the disk

Reply via email to