Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-07 Thread Kevin Wolf
Am 07.12.2016 um 16:50 hat Eric Blake geschrieben:
> On 12/07/2016 04:44 AM, Kevin Wolf wrote:
> >> Just because the NBD spec describes the bit does NOT require that
> >> servers HAVE to set the bit on all images that are all zeroes.  It is
> >> perfectly compliant if the server never advertises the bit.
> > 
> > True, but if no server exists that would actually make use of the
> > feature, it's kind of useless to include it in the spec.
> 
> No server, or no client?

Well, you need both to make use of the feature.

> I think qemu can be a client fairly easily: if the server advertises
> the bit, then the client can set .bdrv_has_zero_init() and avoid
> rewriting any sector of a file that is already known to be zero.

Yes, the client part makes sense to me and should be easy. Are you going
to write the patches yourself?

> > The interesting case is probably where the image is
> > created externally with qemu-img before it's exported either with
> > qemu-nbd or the builtin server, and then we use it as a mirror target.
> > 
> > Even in the rare cases where both image creation and the NBD server are
> > in the same process, bdrv_create() doesn't work on a BlockDriverState,
> > but just on a filename. So even then you would have to do hacks like
> > remembering file names between create and the first open or something
> > like that.
> 
> Or add in a driver-specific callback that checks if a file is provably
> all-zeroes; for the raw file driver, check if lseek(SEEK_DATA) finds
> nothing, for the qcow2 driver, check for no backing files, and no L1
> table entries.

In theory, there should be a common efficient abstraction for these two:

bool bs_is_zeroed(BlockDriverState *bs)
{
int pum;
ret = bdrv_get_block_status(bs, 0, bs->total_sectors, *pnum, NULL);
return (ret == 0 && pnum == bs->total_sectors);
}

For raw this does the lseek() you mentioned, and for qcow2 it will look
at the L1 table and one L2 table (the first one that exists). This is a
bit more expensive than just looking at the L1 table, but so minimally
that it's far from justifying a new command or flag in a protocol.

Now, in practice, this doesn't work because bdrv_get_block_status() can
return early even if the contiguous area is longer that what it
returns. This is something that we should possibly fix sometime in qemu,
but it's also independent from NBD (we can loop in the NBD server to
give the desired semantics).

So we are already going to export a block status querying interface to
NBD. If we require there that the whole contiguous area of the same
status is described and the server can't just shorten it, then we
get the very same thing without a new flag.

> >> Another option on the NBD server side is to create a server option -
> >> when firing up a server to serve a particular file as an export, the
> >> user can explicitly tell the server to advertise the bit because the
> >> user has side knowledge that the file was just created (and then the
> >> burden of misbehavior is on the user if they mistakenly request the
> >> advertisement when it is not true).
> > 
> > Maybe that's the only practical approach.
> 
> But it's still a viable approach, and therefore this bit definition in
> the NBD protocol is worthwhile.

If it adds something that we can't easily get otherwise, and if someone
volunteers to write the patches, then yes. I'm not completely convinced
yet on that.

Kevin


pgpnDz7CGO0mQ.pgp
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-07 Thread Eric Blake
On 12/07/2016 04:44 AM, Kevin Wolf wrote:
>> Just because the NBD spec describes the bit does NOT require that
>> servers HAVE to set the bit on all images that are all zeroes.  It is
>> perfectly compliant if the server never advertises the bit.
> 
> True, but if no server exists that would actually make use of the
> feature, it's kind of useless to include it in the spec.

No server, or no client?  I think qemu can be a client fairly easily: if
the server advertises the bit, then the client can set
.bdrv_has_zero_init() and avoid rewriting any sector of a file that is
already known to be zero.

> 
> I think we should have concrete use cases in mind when extending the
> spec, and explain them in the commit message. Just "maybe this could be
> useful for someone sometime" isn't a good enough justification if you
> ask me.

But I think we DO have a concrete case already in mind, both of a client
being able to take advantage of the information if the bit is set, and
of servers that are able to set the bit because they can either
efficiently determine that the file is all-zeroes, or can give the user
a flag to advertise the bit.

> 
> qemu doesn't really know whether an image has been written to since it
> has been created.

It's not so much whether the image has been written to, as much as
easily proving that the image reads as all zeroes.

> The interesting case is probably where the image is
> created externally with qemu-img before it's exported either with
> qemu-nbd or the builtin server, and then we use it as a mirror target.
> 
> Even in the rare cases where both image creation and the NBD server are
> in the same process, bdrv_create() doesn't work on a BlockDriverState,
> but just on a filename. So even then you would have to do hacks like
> remembering file names between create and the first open or something
> like that.

Or add in a driver-specific callback that checks if a file is provably
all-zeroes; for the raw file driver, check if lseek(SEEK_DATA) finds
nothing, for the qcow2 driver, check for no backing files, and no L1
table entries.

>> Another option on the NBD server side is to create a server option -
>> when firing up a server to serve a particular file as an export, the
>> user can explicitly tell the server to advertise the bit because the
>> user has side knowledge that the file was just created (and then the
>> burden of misbehavior is on the user if they mistakenly request the
>> advertisement when it is not true).
> 
> Maybe that's the only practical approach.

But it's still a viable approach, and therefore this bit definition in
the NBD protocol is worthwhile.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-07 Thread Kevin Wolf
Am 06.12.2016 um 16:21 hat Eric Blake geschrieben:
> On 12/06/2016 03:25 AM, Kevin Wolf wrote:
> > Am 06.12.2016 um 00:42 hat Eric Blake geschrieben:
> >> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
> >> team discovered that it is useful if a server can advertise
> >> whether an export is in a known-all-zeroes state at the time
> >> the client connects.
> > 
> > Does a server usually have the information to set this flag, other than
> > querying the block status of all blocks at startup? If so, the client
> > could just query this by itself.
> 
> Well, only if the client can query information at all (we don't have the
> documentation finished for extent queries, let alone a reference
> implementation).

Right, but I think we all agree that this is something that is necessary
and will come sooner or later.

> > The patch that was originally sent to qemu-devel just forwarded qemu's
> > .bdrv_has_zero_init() call to the server. However, what this function
> > returns is not a known-all-zeroes state on open, but just a
> > known-all-zeroes state immediately after bdrv_create(), i.e. creating a
> > new image. Then it becomes information that is easy to get and doesn't
> > involve querying all blocks (e.g. true for COW image formats, true for
> > raw on regular files, false for raw on block devices).
> 
> Just because the NBD spec describes the bit does NOT require that
> servers HAVE to set the bit on all images that are all zeroes.  It is
> perfectly compliant if the server never advertises the bit.

True, but if no server exists that would actually make use of the
feature, it's kind of useless to include it in the spec.

I think we should have concrete use cases in mind when extending the
spec, and explain them in the commit message. Just "maybe this could be
useful for someone sometime" isn't a good enough justification if you
ask me.

> That said, I think there are cases where qemu can easily advertise the
> bit.
> 
> I _do_ agree that it is NOT as trivial as the qemu server just
> forwarding the value of .bdrv_has_zero_init() - the server HAS to prove
> that no data has been written to the image.  But for a qcow2 image just
> created with qemu-img, it is a fairly easy proof: If the L1 table has
> all-zero entries, then the image has not been written to yet.  Reading
> the L1 table for all-zeroes is only a single cluster read, which is MUCH
> faster than crawling the entire image for extent status.  And for
> regular files, a single lseek(SEEK_DATA) is sufficient to see if the
> entire image is currently sparse.
> 
> Note that I only proposed the NBD implementation - it still remains to
> be coded into the qemu code for the client to make use of the bit
> (fairly easy: if the bit is set, the client can make its own
> .bdrv_has_zero_init() return true), as well as for the server to set the
> bit (harder: the server has to check .bdrv_has_zero_init() of the
> wrapped image, but also has to prove the image is still unwritten).
> Maybe this means that qemu's block layer wants to add a new
> .bdrv_has_been_written() [or whatever name] to better abstract the proof
> across drivers.  But those patches would be qemu 2.9 material, and do
> not need to further cc the NBD list.

qemu doesn't really know whether an image has been written to since it
has been created. The interesting case is probably where the image is
created externally with qemu-img before it's exported either with
qemu-nbd or the builtin server, and then we use it as a mirror target.

Even in the rare cases where both image creation and the NBD server are
in the same process, bdrv_create() doesn't work on a BlockDriverState,
but just on a filename. So even then you would have to do hacks like
remembering file names between create and the first open or something
like that.

> > This is useful for 'qemu-img convert', which creates an image and then
> > writes the whole contents, but I'm not sure if this property is
> > applicable for NBD, which I think doesn't even have a create operation.
> 
> Another option on the NBD server side is to create a server option -
> when firing up a server to serve a particular file as an export, the
> user can explicitly tell the server to advertise the bit because the
> user has side knowledge that the file was just created (and then the
> burden of misbehavior is on the user if they mistakenly request the
> advertisement when it is not true).

Maybe that's the only practical approach.

Kevin


pgpBOnfKJBLEj.pgp
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-06 Thread Eric Blake
On 12/06/2016 03:25 AM, Kevin Wolf wrote:
> Am 06.12.2016 um 00:42 hat Eric Blake geschrieben:
>> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
>> team discovered that it is useful if a server can advertise
>> whether an export is in a known-all-zeroes state at the time
>> the client connects.
> 
> Does a server usually have the information to set this flag, other than
> querying the block status of all blocks at startup? If so, the client
> could just query this by itself.

Well, only if the client can query information at all (we don't have the
documentation finished for extent queries, let alone a reference
implementation).

> 
> The patch that was originally sent to qemu-devel just forwarded qemu's
> .bdrv_has_zero_init() call to the server. However, what this function
> returns is not a known-all-zeroes state on open, but just a
> known-all-zeroes state immediately after bdrv_create(), i.e. creating a
> new image. Then it becomes information that is easy to get and doesn't
> involve querying all blocks (e.g. true for COW image formats, true for
> raw on regular files, false for raw on block devices).

Just because the NBD spec describes the bit does NOT require that
servers HAVE to set the bit on all images that are all zeroes.  It is
perfectly compliant if the server never advertises the bit.  That said,
I think there are cases where qemu can easily advertise the bit.

I _do_ agree that it is NOT as trivial as the qemu server just
forwarding the value of .bdrv_has_zero_init() - the server HAS to prove
that no data has been written to the image.  But for a qcow2 image just
created with qemu-img, it is a fairly easy proof: If the L1 table has
all-zero entries, then the image has not been written to yet.  Reading
the L1 table for all-zeroes is only a single cluster read, which is MUCH
faster than crawling the entire image for extent status.  And for
regular files, a single lseek(SEEK_DATA) is sufficient to see if the
entire image is currently sparse.

Note that I only proposed the NBD implementation - it still remains to
be coded into the qemu code for the client to make use of the bit
(fairly easy: if the bit is set, the client can make its own
.bdrv_has_zero_init() return true), as well as for the server to set the
bit (harder: the server has to check .bdrv_has_zero_init() of the
wrapped image, but also has to prove the image is still unwritten).
Maybe this means that qemu's block layer wants to add a new
.bdrv_has_been_written() [or whatever name] to better abstract the proof
across drivers.  But those patches would be qemu 2.9 material, and do
not need to further cc the NBD list.

> 
> This is useful for 'qemu-img convert', which creates an image and then
> writes the whole contents, but I'm not sure if this property is
> applicable for NBD, which I think doesn't even have a create operation.

Another option on the NBD server side is to create a server option -
when firing up a server to serve a particular file as an export, the
user can explicitly tell the server to advertise the bit because the
user has side knowledge that the file was just created (and then the
burden of misbehavior is on the user if they mistakenly request the
advertisement when it is not true).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-06 Thread Kevin Wolf
Am 06.12.2016 um 00:42 hat Eric Blake geschrieben:
> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
> team discovered that it is useful if a server can advertise
> whether an export is in a known-all-zeroes state at the time
> the client connects.

Does a server usually have the information to set this flag, other than
querying the block status of all blocks at startup? If so, the client
could just query this by itself.

The patch that was originally sent to qemu-devel just forwarded qemu's
.bdrv_has_zero_init() call to the server. However, what this function
returns is not a known-all-zeroes state on open, but just a
known-all-zeroes state immediately after bdrv_create(), i.e. creating a
new image. Then it becomes information that is easy to get and doesn't
involve querying all blocks (e.g. true for COW image formats, true for
raw on regular files, false for raw on block devices).

This is useful for 'qemu-img convert', which creates an image and then
writes the whole contents, but I'm not sure if this property is
applicable for NBD, which I think doesn't even have a create operation.

Kevin