On Tue, Apr 10, 2018 at 5:50 PM Richard W.M. Jones <rjo...@redhat.com>
> On Tue, Apr 10, 2018 at 02:07:33PM +0000, Nir Soffer wrote:
> > This makes sense if the device is backed by a block device on oVirt side,
> > and the NBD support efficient zeroing. But in this case the device is
> > by an empty sparse file on NFS, and oVirt does not support yet efficient
> > zeroing, we just write zeros manually.
> > I think should be handled on virt-v2v plugin side. When zeroing a file
> > image,
> > you can ignore zero requests after the highest write offset, since the
> > plugin
> > created a new image, and we know that the image is empty.
> > When the destination is a block device we cannot avoid zeroing since a
> > device may contain junk data (we usually get dirty empty images from our
> > local
> > xtremio server).
> (Off topic for qemu-block but ...) We don't have enough information
> at our end to know about any of this.
Can't use use this logic in the oVirt plugin?
file based storage -> skip initial zeroing
block based storage -> use initial zeroing
Do you think that publishing disk capabilities in the sdk will solve this?
> > > The problem is that the NBD block driver has max_pwrite_zeroes = 32 MB,
> > > so it's not that efficient after all. I'm not sure if there is a real
> > > reason for this, but Eric should know.
> > >
> > We support zero with unlimited size without sending any payload to oVirt,
> > so
> > there is no reason to limit zero request by max_pwrite_zeros. This limit
> > make sense when zero is emulated using pwrite.
> Yes, this seems wrong, but I'd want Eric to comment.
> > > > However, since you suggest that we could use "trim" request for these
> > > > requests, it means that these requests are advisory (since trim is),
> > > > we can just ignore them if the server does not support trim.
> > >
> > > What qemu-img sends shouldn't be a NBD_CMD_TRIM request (which is
> > > advisory), but a NBD_CMD_WRITE_ZEROES request. qemu-img relies on the
> > > image actually being zeroed after this.
> > >
> > So it seems that may_trim=1 is wrong, since trim cannot replace zero.
> Note that the current plugin ignores may_trim. It is not used at all,
> so it's not relevant to this problem.
> However this flag actually corresponds to the inverse of
> NBD_CMD_FLAG_NO_HOLE which is defined by the NBD spec as:
> bit 1, NBD_CMD_FLAG_NO_HOLE; valid during
> NBD_CMD_WRITE_ZEROES. SHOULD be set to 1 if the client wants to
> ensure that the server does not create a hole. The client MAY send
> NBD_CMD_FLAG_NO_HOLE even if NBD_FLAG_SEND_TRIM was not set in the
> transmission flags field. The server MUST support the use of this
> flag if it advertises NBD_FLAG_SEND_WRITE_ZEROES. *
> qemu-img convert uses NBD_CMD_WRITE_ZEROES and does NOT set this flag
> (hence in the plugin we see may_trim=1), and I believe that qemu-img
> is correct because it doesn't want to force preallocation.
So once oVirt will support efficient zeroing, this flag may be translated to
(for file based storage):
may_trim=1 -> fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
may_trim=0 -> fallocate(FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE)
We planed to choose this by default on oVirt side, based on disk type. For
disk we never want to use FALLOC_FL_PUNCH_HOLE, and for sparse disk we
want to use FALLOC_FL_PUNCH_HOLE unless it is not supported.
Seems that we need to add a "trim" or "punch_hole" flag to the PATCH/zero
so you can hint oVirt how do you want to zero. oVirt will choose what to do
on storage type (file/block), user request(trim/notrim), and disk type
I think we can start the use this flag when we publish the "trim" feature.