On 04/10/2018 09:40 AM, Richard W.M. Jones wrote: >> When the destination is a block device we cannot avoid zeroing since a block >> device may contain junk data (we usually get dirty empty images from our >> local >> xtremio server). > > (Off topic for qemu-block but ...) We don't have enough information > at our end to know about any of this.
Yep, see my other email about a possible NBD protocol extension to actually let the client learn up-front if the exported device is known to start in an all-zero state. > >>> The problem is that the NBD block driver has max_pwrite_zeroes = 32 MB, >>> so it's not that efficient after all. I'm not sure if there is a real >>> reason for this, but Eric should know. >>> >> >> We support zero with unlimited size without sending any payload to oVirt, >> so >> there is no reason to limit zero request by max_pwrite_zeros. This limit may >> make sense when zero is emulated using pwrite. > > Yes, this seems wrong, but I'd want Eric to comment. The 32M cap is currently the fault of qemu-img, not nbdkit (nbdkit is not further reducing the size of the zero requests it passes on to oVirt); and I explained in the other email about how qemu 2.13 will fix things to send larger zero requests (hmm, that means nbdkit really needs to start supporting NBD_OPT_GO, as that is what qemu will be relying on to learn the larger limits). > >>>> However, since you suggest that we could use "trim" request for these >>>> requests, it means that these requests are advisory (since trim is), and >>>> we can just ignore them if the server does not support trim. >>> >>> What qemu-img sends shouldn't be a NBD_CMD_TRIM request (which is indeed >>> advisory), but a NBD_CMD_WRITE_ZEROES request. qemu-img relies on the >>> image actually being zeroed after this. >>> >> >> So it seems that may_trim=1 is wrong, since trim cannot replace zero. > > Note that the current plugin ignores may_trim. It is not used at all, > so it's not relevant to this problem. > > However this flag actually corresponds to the inverse of > NBD_CMD_FLAG_NO_HOLE which is defined by the NBD spec as: > > bit 1, NBD_CMD_FLAG_NO_HOLE; valid during > NBD_CMD_WRITE_ZEROES. SHOULD be set to 1 if the client wants to > ensure that the server does not create a hole. The client MAY send > NBD_CMD_FLAG_NO_HOLE even if NBD_FLAG_SEND_TRIM was not set in the > transmission flags field. The server MUST support the use of this > flag if it advertises NBD_FLAG_SEND_WRITE_ZEROES. * > > qemu-img convert uses NBD_CMD_WRITE_ZEROES and does NOT set this flag > (hence in the plugin we see may_trim=1), and I believe that qemu-img > is correct because it doesn't want to force preallocation. Yes, the flag usage is correct, and you are also correct that the 'may_trim' flag of nbdkit is the inverse bit sense of the NBD_CMD_FLAG_NO_HOLE of the NBD protocol; it's all a documentation game in deciding whether having a bit be 0 or 1 in the default state made more sense. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Description: OpenPGP digital signature