Re: [Qemu-devel] block: Review of .has_zero_init use
Am 26.06.2013 um 05:14 schrieb Bharata B Rao bhar...@linux.vnet.ibm.com: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming a device is zero initialized can bring severe filesystem corruption. Do you see a way to query the information of the underlaying backend from the storage and return 1 or 0 conditionally? Peter
Re: [Qemu-devel] block: Review of .has_zero_init use
On Wed, Jun 26, 2013 at 07:59:27AM +0200, Peter Lieven wrote: Am 26.06.2013 um 05:14 schrieb Bharata B Rao bhar...@linux.vnet.ibm.com: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming a device is zero initialized can bring severe filesystem corruption. Do you see a way to query the information of the underlaying backend from the storage and return 1 or 0 conditionally? Right now no. There have been efforts to get the capabilities (file, block etc) of gluster volume from gluster cmdline, but there haven't been discussions about providing such capabilities from libgfapi which is the library QEMU uses to talk with gluster. I see has_zero_init being used from qemu-img convert path. Is there any other use of this in QEMU ? Regards, Bharata.
Re: [Qemu-devel] block: Review of .has_zero_init use
Am 26.06.2013 um 08:46 hat Bharata B Rao geschrieben: On Wed, Jun 26, 2013 at 07:59:27AM +0200, Peter Lieven wrote: Am 26.06.2013 um 05:14 schrieb Bharata B Rao bhar...@linux.vnet.ibm.com: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Okay, so for safety we have to return has_zero_init = 0. Erroneously assuming a device is zero initialized can bring severe filesystem corruption. Do you see a way to query the information of the underlaying backend from the storage and return 1 or 0 conditionally? Right now no. There have been efforts to get the capabilities (file, block etc) of gluster volume from gluster cmdline, but there haven't been discussions about providing such capabilities from libgfapi which is the library QEMU uses to talk with gluster. I see has_zero_init being used from qemu-img convert path. Is there any other use of this in QEMU ? No, it's the only user. So what this means is that using 'qemu-img convert' with a glusterfs target corrupts the image if the volume is backed by a block device. Kevin
[Qemu-devel] block: Review of .has_zero_init use
Hi all, while discussing some iscsi patches with Peter, we came to have a look at which block drivers implement has_zero_init() to return 0, and which don't (returning 1 is the default). The meaning of this value is that if has_zero_init != 0, after bdrv_create() one can assume that the whole image would read back as all zero. For example, this is true for the traditional image files, but not for host_device, where the block device isn't really created during bdrv_create() but only checked for size. The full list of protocol level block drivers is: * blkdebug - doesn't have bdrv_create * blkverify - doesn't have bdrv_create * curl - doesn't have bdrv_create * gluster - currently has_zero_init = 1 (is this correct?) * iscsi - has_zero_init = 0 * nbd - doesn't have bdrv_create * file - has_zero_init = 1 * host_*- has_zero_init = 0 * rbd - currently has_zero_init = 1 (is this correct?) * sheepdog - currently has_zero_init = 1 (is this correct?) * ssh - currently has_zero_init = 1 (is this correct?) * vvfat - doesn't have bdrv_create Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? It might be possible that the correct value depends on the backend on the server side for some protocols - for example, I think for SSH it depends on whether you access a regular file or a block device on the other host (if accessing a block device is even possible). In such cases, has_zero_init = 0 is the safe default. We're probably going to change the meaning of unimplemented has_zero_init to return 0, but it would be good to check if the current code was actually meant to do what it does today. Kevin
Re: [Qemu-devel] block: Review of .has_zero_init use
On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: * ssh - currently has_zero_init = 1 (is this correct?) [...] It might be possible that the correct value depends on the backend on the server side for some protocols - for example, I think for SSH it depends on whether you access a regular file or a block device on the other host (if accessing a block device is even possible). This seems to depend on the behaviour of O_TRUNC on block devices. The man page says it's unspecified, but I tested it on Linux, and Linux ignores it (for logical volumes anyway). When the ssh driver is asked to do bdrv_create it does on the remote end: - open (filename, O_RDWR|O_CREAT|O_TRUNC, 0644) [1] - lseek (fd, size-1, SEEK_SET) [2] - write (fd, '\0', 1) In other words for regular files, it creates a sparse file. For block devices, I tested the sequence above, and it doesn't fail. So .. I guess that has_zero_init = 0 would be correct? Unless we fstat the fd after opening it and return some conditional value from bdrv_has_zero_init eg if it's a block device. Is that possible? Rich. [1] Mode 0644 is hard-coded :-( [2] I realize now it's actually possible to use ftruncate, although it's not obvious. There is no explicit truncate operation in sftp. But there is a setattr operation, which if you try to set the size attr in fact does a truncate at the remote end. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Re: [Qemu-devel] block: Review of .has_zero_init use
Am 25.06.2013 um 14:05 hat Richard W.M. Jones geschrieben: On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: * ssh - currently has_zero_init = 1 (is this correct?) [...] It might be possible that the correct value depends on the backend on the server side for some protocols - for example, I think for SSH it depends on whether you access a regular file or a block device on the other host (if accessing a block device is even possible). This seems to depend on the behaviour of O_TRUNC on block devices. The man page says it's unspecified, but I tested it on Linux, and Linux ignores it (for logical volumes anyway). When the ssh driver is asked to do bdrv_create it does on the remote end: - open (filename, O_RDWR|O_CREAT|O_TRUNC, 0644) [1] - lseek (fd, size-1, SEEK_SET) [2] - write (fd, '\0', 1) In other words for regular files, it creates a sparse file. For block devices, I tested the sequence above, and it doesn't fail. So .. I guess that has_zero_init = 0 would be correct? Yes, that's my understanding. Unless we fstat the fd after opening it and return some conditional value from bdrv_has_zero_init eg if it's a block device. Is that possible? .bdrv_has_zero_init is a callback that gets a specific BlockDriverState, so it's possible to do some checks in there. If you can get the necessary information using SSH, doing it dynamically is certainly an option. Kevin
Re: [Qemu-devel] block: Review of .has_zero_init use
At Tue, 25 Jun 2013 13:39:11 +0200, Kevin Wolf wrote: Hi all, while discussing some iscsi patches with Peter, we came to have a look at which block drivers implement has_zero_init() to return 0, and which don't (returning 1 is the default). The meaning of this value is that if has_zero_init != 0, after bdrv_create() one can assume that the whole image would read back as all zero. For example, this is true for the traditional image files, but not for host_device, where the block device isn't really created during bdrv_create() but only checked for size. The full list of protocol level block drivers is: * blkdebug - doesn't have bdrv_create * blkverify - doesn't have bdrv_create * curl - doesn't have bdrv_create * gluster - currently has_zero_init = 1 (is this correct?) * iscsi - has_zero_init = 0 * nbd - doesn't have bdrv_create * file - has_zero_init = 1 * host_*- has_zero_init = 0 * rbd - currently has_zero_init = 1 (is this correct?) * sheepdog - currently has_zero_init = 1 (is this correct?) * ssh - currently has_zero_init = 1 (is this correct?) * vvfat - doesn't have bdrv_create Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? It's safe for Sheepdog. Sheepdog uses ftruncate or fallocate to create data blocks and it is guaranteed that the space will be initialized to zero. Thanks, Kazutaka
Re: [Qemu-devel] block: Review of .has_zero_init use
Am 25.06.2013 um 15:11 hat MORITA Kazutaka geschrieben: At Tue, 25 Jun 2013 13:39:11 +0200, Kevin Wolf wrote: Hi all, while discussing some iscsi patches with Peter, we came to have a look at which block drivers implement has_zero_init() to return 0, and which don't (returning 1 is the default). The meaning of this value is that if has_zero_init != 0, after bdrv_create() one can assume that the whole image would read back as all zero. For example, this is true for the traditional image files, but not for host_device, where the block device isn't really created during bdrv_create() but only checked for size. The full list of protocol level block drivers is: * blkdebug - doesn't have bdrv_create * blkverify - doesn't have bdrv_create * curl - doesn't have bdrv_create * gluster - currently has_zero_init = 1 (is this correct?) * iscsi - has_zero_init = 0 * nbd - doesn't have bdrv_create * file - has_zero_init = 1 * host_*- has_zero_init = 0 * rbd - currently has_zero_init = 1 (is this correct?) * sheepdog - currently has_zero_init = 1 (is this correct?) * ssh - currently has_zero_init = 1 (is this correct?) * vvfat - doesn't have bdrv_create Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? It's safe for Sheepdog. Sheepdog uses ftruncate or fallocate to create data blocks and it is guaranteed that the space will be initialized to zero. Note that ftruncate/fallocate don't zero-initialise block devices, only regular files. Not sure if you can use block devices to back Sheepdog images? Kevin
Re: [Qemu-devel] block: Review of .has_zero_init use
At Tue, 25 Jun 2013 15:20:18 +0200, Kevin Wolf wrote: Am 25.06.2013 um 15:11 hat MORITA Kazutaka geschrieben: At Tue, 25 Jun 2013 13:39:11 +0200, Kevin Wolf wrote: Hi all, while discussing some iscsi patches with Peter, we came to have a look at which block drivers implement has_zero_init() to return 0, and which don't (returning 1 is the default). The meaning of this value is that if has_zero_init != 0, after bdrv_create() one can assume that the whole image would read back as all zero. For example, this is true for the traditional image files, but not for host_device, where the block device isn't really created during bdrv_create() but only checked for size. The full list of protocol level block drivers is: * blkdebug - doesn't have bdrv_create * blkverify - doesn't have bdrv_create * curl - doesn't have bdrv_create * gluster - currently has_zero_init = 1 (is this correct?) * iscsi - has_zero_init = 0 * nbd - doesn't have bdrv_create * file - has_zero_init = 1 * host_*- has_zero_init = 0 * rbd - currently has_zero_init = 1 (is this correct?) * sheepdog - currently has_zero_init = 1 (is this correct?) * ssh - currently has_zero_init = 1 (is this correct?) * vvfat - doesn't have bdrv_create Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? It's safe for Sheepdog. Sheepdog uses ftruncate or fallocate to create data blocks and it is guaranteed that the space will be initialized to zero. Note that ftruncate/fallocate don't zero-initialise block devices, only regular files. Not sure if you can use block devices to back Sheepdog images? We cannot do that. Sheepdog heavily relies on filesystem features and we will not support block devices for the backend of Sheepdog. Thanks, Kazutaka
Re: [Qemu-devel] block: Review of .has_zero_init use
Am 25.06.2013 um 15:42 hat MORITA Kazutaka geschrieben: At Tue, 25 Jun 2013 15:20:18 +0200, Kevin Wolf wrote: Am 25.06.2013 um 15:11 hat MORITA Kazutaka geschrieben: At Tue, 25 Jun 2013 13:39:11 +0200, Kevin Wolf wrote: Hi all, while discussing some iscsi patches with Peter, we came to have a look at which block drivers implement has_zero_init() to return 0, and which don't (returning 1 is the default). The meaning of this value is that if has_zero_init != 0, after bdrv_create() one can assume that the whole image would read back as all zero. For example, this is true for the traditional image files, but not for host_device, where the block device isn't really created during bdrv_create() but only checked for size. The full list of protocol level block drivers is: * blkdebug - doesn't have bdrv_create * blkverify - doesn't have bdrv_create * curl - doesn't have bdrv_create * gluster - currently has_zero_init = 1 (is this correct?) * iscsi - has_zero_init = 0 * nbd - doesn't have bdrv_create * file - has_zero_init = 1 * host_*- has_zero_init = 0 * rbd - currently has_zero_init = 1 (is this correct?) * sheepdog - currently has_zero_init = 1 (is this correct?) * ssh - currently has_zero_init = 1 (is this correct?) * vvfat - doesn't have bdrv_create Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? It's safe for Sheepdog. Sheepdog uses ftruncate or fallocate to create data blocks and it is guaranteed that the space will be initialized to zero. Note that ftruncate/fallocate don't zero-initialise block devices, only regular files. Not sure if you can use block devices to back Sheepdog images? We cannot do that. Sheepdog heavily relies on filesystem features and we will not support block devices for the backend of Sheepdog. Okay, so Sheepdog is indeed safe. Thanks, Kazutaka! Kevin
Re: [Qemu-devel] block: Review of .has_zero_init use
On 06/25/2013 04:39 AM, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? This is always safe for rbd.
Re: [Qemu-devel] block: Review of .has_zero_init use
On Tue, Jun 25, 2013 at 01:39:11PM +0200, Kevin Wolf wrote: Can you please review for the gluster, rbd, sheepdog and ssh driver whether it's safe to assume that the image reads back as zeros after bdrv_create? Gluster supports both file and block backends. While the above is true for file backend (which uses ftruncate), the same is not true for block backend (which uses lvcreate lvresize). So overall it is not safe to assume that an image on GlusterFS volume reads back as zeroes after create. Regards, Bharata.