Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension
Am 07.12.2016 um 16:50 hat Eric Blake geschrieben: > On 12/07/2016 04:44 AM, Kevin Wolf wrote: > >> Just because the NBD spec describes the bit does NOT require that > >> servers HAVE to set the bit on all images that are all zeroes. It is > >> perfectly compliant if the server never advertises the bit. > > > > True, but if no server exists that would actually make use of the > > feature, it's kind of useless to include it in the spec. > > No server, or no client? Well, you need both to make use of the feature. > I think qemu can be a client fairly easily: if the server advertises > the bit, then the client can set .bdrv_has_zero_init() and avoid > rewriting any sector of a file that is already known to be zero. Yes, the client part makes sense to me and should be easy. Are you going to write the patches yourself? > > The interesting case is probably where the image is > > created externally with qemu-img before it's exported either with > > qemu-nbd or the builtin server, and then we use it as a mirror target. > > > > Even in the rare cases where both image creation and the NBD server are > > in the same process, bdrv_create() doesn't work on a BlockDriverState, > > but just on a filename. So even then you would have to do hacks like > > remembering file names between create and the first open or something > > like that. > > Or add in a driver-specific callback that checks if a file is provably > all-zeroes; for the raw file driver, check if lseek(SEEK_DATA) finds > nothing, for the qcow2 driver, check for no backing files, and no L1 > table entries. In theory, there should be a common efficient abstraction for these two: bool bs_is_zeroed(BlockDriverState *bs) { int pum; ret = bdrv_get_block_status(bs, 0, bs->total_sectors, *pnum, NULL); return (ret == 0 && pnum == bs->total_sectors); } For raw this does the lseek() you mentioned, and for qcow2 it will look at the L1 table and one L2 table (the first one that exists). This is a bit more expensive than just looking at the L1 table, but so minimally that it's far from justifying a new command or flag in a protocol. Now, in practice, this doesn't work because bdrv_get_block_status() can return early even if the contiguous area is longer that what it returns. This is something that we should possibly fix sometime in qemu, but it's also independent from NBD (we can loop in the NBD server to give the desired semantics). So we are already going to export a block status querying interface to NBD. If we require there that the whole contiguous area of the same status is described and the server can't just shorten it, then we get the very same thing without a new flag. > >> Another option on the NBD server side is to create a server option - > >> when firing up a server to serve a particular file as an export, the > >> user can explicitly tell the server to advertise the bit because the > >> user has side knowledge that the file was just created (and then the > >> burden of misbehavior is on the user if they mistakenly request the > >> advertisement when it is not true). > > > > Maybe that's the only practical approach. > > But it's still a viable approach, and therefore this bit definition in > the NBD protocol is worthwhile. If it adds something that we can't easily get otherwise, and if someone volunteers to write the patches, then yes. I'm not completely convinced yet on that. Kevin pgpnDz7CGO0mQ.pgp Description: PGP signature
Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension
On 12/07/2016 04:44 AM, Kevin Wolf wrote: >> Just because the NBD spec describes the bit does NOT require that >> servers HAVE to set the bit on all images that are all zeroes. It is >> perfectly compliant if the server never advertises the bit. > > True, but if no server exists that would actually make use of the > feature, it's kind of useless to include it in the spec. No server, or no client? I think qemu can be a client fairly easily: if the server advertises the bit, then the client can set .bdrv_has_zero_init() and avoid rewriting any sector of a file that is already known to be zero. > > I think we should have concrete use cases in mind when extending the > spec, and explain them in the commit message. Just "maybe this could be > useful for someone sometime" isn't a good enough justification if you > ask me. But I think we DO have a concrete case already in mind, both of a client being able to take advantage of the information if the bit is set, and of servers that are able to set the bit because they can either efficiently determine that the file is all-zeroes, or can give the user a flag to advertise the bit. > > qemu doesn't really know whether an image has been written to since it > has been created. It's not so much whether the image has been written to, as much as easily proving that the image reads as all zeroes. > The interesting case is probably where the image is > created externally with qemu-img before it's exported either with > qemu-nbd or the builtin server, and then we use it as a mirror target. > > Even in the rare cases where both image creation and the NBD server are > in the same process, bdrv_create() doesn't work on a BlockDriverState, > but just on a filename. So even then you would have to do hacks like > remembering file names between create and the first open or something > like that. Or add in a driver-specific callback that checks if a file is provably all-zeroes; for the raw file driver, check if lseek(SEEK_DATA) finds nothing, for the qcow2 driver, check for no backing files, and no L1 table entries. >> Another option on the NBD server side is to create a server option - >> when firing up a server to serve a particular file as an export, the >> user can explicitly tell the server to advertise the bit because the >> user has side knowledge that the file was just created (and then the >> burden of misbehavior is on the user if they mistakenly request the >> advertisement when it is not true). > > Maybe that's the only practical approach. But it's still a viable approach, and therefore this bit definition in the NBD protocol is worthwhile. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension
Am 06.12.2016 um 16:21 hat Eric Blake geschrieben: > On 12/06/2016 03:25 AM, Kevin Wolf wrote: > > Am 06.12.2016 um 00:42 hat Eric Blake geschrieben: > >> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu > >> team discovered that it is useful if a server can advertise > >> whether an export is in a known-all-zeroes state at the time > >> the client connects. > > > > Does a server usually have the information to set this flag, other than > > querying the block status of all blocks at startup? If so, the client > > could just query this by itself. > > Well, only if the client can query information at all (we don't have the > documentation finished for extent queries, let alone a reference > implementation). Right, but I think we all agree that this is something that is necessary and will come sooner or later. > > The patch that was originally sent to qemu-devel just forwarded qemu's > > .bdrv_has_zero_init() call to the server. However, what this function > > returns is not a known-all-zeroes state on open, but just a > > known-all-zeroes state immediately after bdrv_create(), i.e. creating a > > new image. Then it becomes information that is easy to get and doesn't > > involve querying all blocks (e.g. true for COW image formats, true for > > raw on regular files, false for raw on block devices). > > Just because the NBD spec describes the bit does NOT require that > servers HAVE to set the bit on all images that are all zeroes. It is > perfectly compliant if the server never advertises the bit. True, but if no server exists that would actually make use of the feature, it's kind of useless to include it in the spec. I think we should have concrete use cases in mind when extending the spec, and explain them in the commit message. Just "maybe this could be useful for someone sometime" isn't a good enough justification if you ask me. > That said, I think there are cases where qemu can easily advertise the > bit. > > I _do_ agree that it is NOT as trivial as the qemu server just > forwarding the value of .bdrv_has_zero_init() - the server HAS to prove > that no data has been written to the image. But for a qcow2 image just > created with qemu-img, it is a fairly easy proof: If the L1 table has > all-zero entries, then the image has not been written to yet. Reading > the L1 table for all-zeroes is only a single cluster read, which is MUCH > faster than crawling the entire image for extent status. And for > regular files, a single lseek(SEEK_DATA) is sufficient to see if the > entire image is currently sparse. > > Note that I only proposed the NBD implementation - it still remains to > be coded into the qemu code for the client to make use of the bit > (fairly easy: if the bit is set, the client can make its own > .bdrv_has_zero_init() return true), as well as for the server to set the > bit (harder: the server has to check .bdrv_has_zero_init() of the > wrapped image, but also has to prove the image is still unwritten). > Maybe this means that qemu's block layer wants to add a new > .bdrv_has_been_written() [or whatever name] to better abstract the proof > across drivers. But those patches would be qemu 2.9 material, and do > not need to further cc the NBD list. qemu doesn't really know whether an image has been written to since it has been created. The interesting case is probably where the image is created externally with qemu-img before it's exported either with qemu-nbd or the builtin server, and then we use it as a mirror target. Even in the rare cases where both image creation and the NBD server are in the same process, bdrv_create() doesn't work on a BlockDriverState, but just on a filename. So even then you would have to do hacks like remembering file names between create and the first open or something like that. > > This is useful for 'qemu-img convert', which creates an image and then > > writes the whole contents, but I'm not sure if this property is > > applicable for NBD, which I think doesn't even have a create operation. > > Another option on the NBD server side is to create a server option - > when firing up a server to serve a particular file as an export, the > user can explicitly tell the server to advertise the bit because the > user has side knowledge that the file was just created (and then the > burden of misbehavior is on the user if they mistakenly request the > advertisement when it is not true). Maybe that's the only practical approach. Kevin pgpBOnfKJBLEj.pgp Description: PGP signature
Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension
On 12/06/2016 03:25 AM, Kevin Wolf wrote: > Am 06.12.2016 um 00:42 hat Eric Blake geschrieben: >> While not directly related to NBD_CMD_WRITE_ZEROES, the qemu >> team discovered that it is useful if a server can advertise >> whether an export is in a known-all-zeroes state at the time >> the client connects. > > Does a server usually have the information to set this flag, other than > querying the block status of all blocks at startup? If so, the client > could just query this by itself. Well, only if the client can query information at all (we don't have the documentation finished for extent queries, let alone a reference implementation). > > The patch that was originally sent to qemu-devel just forwarded qemu's > .bdrv_has_zero_init() call to the server. However, what this function > returns is not a known-all-zeroes state on open, but just a > known-all-zeroes state immediately after bdrv_create(), i.e. creating a > new image. Then it becomes information that is easy to get and doesn't > involve querying all blocks (e.g. true for COW image formats, true for > raw on regular files, false for raw on block devices). Just because the NBD spec describes the bit does NOT require that servers HAVE to set the bit on all images that are all zeroes. It is perfectly compliant if the server never advertises the bit. That said, I think there are cases where qemu can easily advertise the bit. I _do_ agree that it is NOT as trivial as the qemu server just forwarding the value of .bdrv_has_zero_init() - the server HAS to prove that no data has been written to the image. But for a qcow2 image just created with qemu-img, it is a fairly easy proof: If the L1 table has all-zero entries, then the image has not been written to yet. Reading the L1 table for all-zeroes is only a single cluster read, which is MUCH faster than crawling the entire image for extent status. And for regular files, a single lseek(SEEK_DATA) is sufficient to see if the entire image is currently sparse. Note that I only proposed the NBD implementation - it still remains to be coded into the qemu code for the client to make use of the bit (fairly easy: if the bit is set, the client can make its own .bdrv_has_zero_init() return true), as well as for the server to set the bit (harder: the server has to check .bdrv_has_zero_init() of the wrapped image, but also has to prove the image is still unwritten). Maybe this means that qemu's block layer wants to add a new .bdrv_has_been_written() [or whatever name] to better abstract the proof across drivers. But those patches would be qemu 2.9 material, and do not need to further cc the NBD list. > > This is useful for 'qemu-img convert', which creates an image and then > writes the whole contents, but I'm not sure if this property is > applicable for NBD, which I think doesn't even have a create operation. Another option on the NBD server side is to create a server option - when firing up a server to serve a particular file as an export, the user can explicitly tell the server to advertise the bit because the user has side knowledge that the file was just created (and then the burden of misbehavior is on the user if they mistakenly request the advertisement when it is not true). -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension
Am 06.12.2016 um 00:42 hat Eric Blake geschrieben: > While not directly related to NBD_CMD_WRITE_ZEROES, the qemu > team discovered that it is useful if a server can advertise > whether an export is in a known-all-zeroes state at the time > the client connects. Does a server usually have the information to set this flag, other than querying the block status of all blocks at startup? If so, the client could just query this by itself. The patch that was originally sent to qemu-devel just forwarded qemu's .bdrv_has_zero_init() call to the server. However, what this function returns is not a known-all-zeroes state on open, but just a known-all-zeroes state immediately after bdrv_create(), i.e. creating a new image. Then it becomes information that is easy to get and doesn't involve querying all blocks (e.g. true for COW image formats, true for raw on regular files, false for raw on block devices). This is useful for 'qemu-img convert', which creates an image and then writes the whole contents, but I'm not sure if this property is applicable for NBD, which I think doesn't even have a create operation. Kevin