Hi, Stefan & Kevin, Could you help to have a look at this version? We've discussed about this last November and now switch it to QemuOpts.
Thanks, Chunyan >>> On 6/23/2014 at 05:17 PM, in message <1403515022-24802-1-git-send-email-cy...@suse.com>, Chunyan Liu <cy...@suse.com> wrote: > Add 'nocow' option so that users could have a chance to set NOCOW flag to > newly created files. It's useful on btrfs file system to enhance > performance. > > Btrfs has low performance when hosting VM images, even more when the guest > in those VM are also using btrfs as file system. One way to mitigate this > bad > performance is to turn off COW attributes on VM files. Generally, there are > two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then > all newly created files will be NOCOW. b) per file. Add the NOCOW file > attribute. It could only be done to empty or new files. > > This patch tries the second way, according to the option, it could add NOCOW > per file. > > For most block drivers, since the create file step is in raw-posix.c, so we > can do setting NOCOW flag ioctl in raw-posix.c only. > > But there are some exceptions, like block/vpc.c and block/vdi.c, they are > creating file by calling qemu_open directly. For them, do the same setting > NOCOW flag ioctl work in them separately. > > Signed-off-by: Chunyan Liu <cy...@suse.com> > --- > Changes to v2: > * based on QemuOpts instead of old QEMUOptionParameters > * add nocow description in man page and html doc > > Old v2 is here: > http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html > > --- > block/cow.c | 5 +++++ > block/qcow.c | 5 +++++ > block/qcow2.c | 5 +++++ > block/qed.c | 11 ++++++++--- > block/raw-posix.c | 25 +++++++++++++++++++++++++ > block/vdi.c | 29 +++++++++++++++++++++++++++++ > block/vhdx.c | 5 +++++ > block/vmdk.c | 11 ++++++++--- > block/vpc.c | 29 +++++++++++++++++++++++++++++ > include/block/block_int.h | 1 + > qemu-doc.texi | 16 ++++++++++++++++ > qemu-img.texi | 16 ++++++++++++++++ > 12 files changed, 152 insertions(+), 6 deletions(-) > > diff --git a/block/cow.c b/block/cow.c > index a05a92c..43b537c 100644 > --- a/block/cow.c > +++ b/block/cow.c > @@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = { > .type = QEMU_OPT_STRING, > .help = "File name of a base image" > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/block/qcow.c b/block/qcow.c > index 1f2bac8..5b23540 100644 > --- a/block/qcow.c > +++ b/block/qcow.c > @@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = { > .help = "Encrypt the image", > .def_value_str = "off" > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/block/qcow2.c b/block/qcow2.c > index b9d2fa6..3a4cc8a 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = { > .help = "Postpone refcount updates", > .def_value_str = "off" > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/block/qed.c b/block/qed.c > index 092e6fb..460ac92 100644 > --- a/block/qed.c > +++ b/block/qed.c > @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs) > static int qed_create(const char *filename, uint32_t cluster_size, > uint64_t image_size, uint32_t table_size, > const char *backing_file, const char *backing_fmt, > - Error **errp) > + QemuOpts *opts, Error **errp) > { > QEDHeader header = { > .magic = QED_MAGIC, > @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t > cluster_size, > int ret = 0; > BlockDriverState *bs; > > - ret = bdrv_create_file(filename, NULL, &local_err); > + ret = bdrv_create_file(filename, opts, &local_err); > if (ret < 0) { > error_propagate(errp, local_err); > return ret; > @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts > > *opts, Error **errp) > } > > ret = qed_create(filename, cluster_size, image_size, table_size, > - backing_file, backing_fmt, errp); > + backing_file, backing_fmt, opts, errp); > > finish: > g_free(backing_file); > @@ -1644,6 +1644,11 @@ static QemuOptsList qed_create_opts = { > .type = QEMU_OPT_SIZE, > .help = "L1/L2 table size (in clusters)" > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/block/raw-posix.c b/block/raw-posix.c > index dacf4fb..825a0c8 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -55,6 +55,9 @@ > #include <linux/cdrom.h> > #include <linux/fd.h> > #include <linux/fs.h> > +#ifndef FS_NOCOW_FL > +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */ > +#endif > #endif > #ifdef CONFIG_FIEMAP > #include <linux/fiemap.h> > @@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts > *opts, Error **errp) > int fd; > int result = 0; > int64_t total_size = 0; > + bool nocow = false; > > strstart(filename, "file:", &filename); > > /* Read out options */ > total_size = > qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE; > + nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false); > > fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, > 0644); > @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts > *opts, Error **errp) > result = -errno; > error_setg_errno(errp, -result, "Could not create file"); > } else { > + if (nocow) { > +#ifdef __linux__ > + /* Set NOCOW flag to solve performance issue on fs like btrfs. > + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return > value > + * will be ignored since any failure of this operation should > not > + * block the left work. > + */ > + int attr; > + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { > + attr |= FS_NOCOW_FL; > + ioctl(fd, FS_IOC_SETFLAGS, &attr); > + } > +#endif > + } > + > if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { > result = -errno; > error_setg_errno(errp, -result, "Could not resize file"); > @@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = { > .type = QEMU_OPT_SIZE, > .help = "Virtual disk size" > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/block/vdi.c b/block/vdi.c > index 01fe22e..197bd77 100644 > --- a/block/vdi.c > +++ b/block/vdi.c > @@ -53,6 +53,13 @@ > #include "block/block_int.h" > #include "qemu/module.h" > #include "migration/migration.h" > +#ifdef __linux__ > +#include <linux/fs.h> > +#include <sys/ioctl.h> > +#ifndef FS_NOCOW_FL > +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */ > +#endif > +#endif > > #if defined(CONFIG_UUID) > #include <uuid/uuid.h> > @@ -683,6 +690,7 @@ static int vdi_create(const char *filename, QemuOpts > *opts, Error **errp) > VdiHeader header; > size_t i; > size_t bmap_size; > + bool nocow = false; > > logout("\n"); > > @@ -699,6 +707,7 @@ static int vdi_create(const char *filename, QemuOpts > *opts, Error **errp) > image_type = VDI_TYPE_STATIC; > } > #endif > + nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); > > if (bytes > VDI_DISK_SIZE_MAX) { > result = -ENOTSUP; > @@ -716,6 +725,21 @@ static int vdi_create(const char *filename, QemuOpts > *opts, Error **errp) > goto exit; > } > > + if (nocow) { > +#ifdef __linux__ > + /* Set NOCOW flag to solve performance issue on fs like btrfs. > + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value > will > + * be ignored since any failure of this operation should not block > the > + * left work. > + */ > + int attr; > + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { > + attr |= FS_NOCOW_FL; > + ioctl(fd, FS_IOC_SETFLAGS, &attr); > + } > +#endif > + } > + > /* We need enough blocks to store the given disk size, > so always round up. */ > blocks = (bytes + block_size - 1) / block_size; > @@ -818,6 +842,11 @@ static QemuOptsList vdi_create_opts = { > .def_value_str = "off" > }, > #endif > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > /* TODO: An additional option to set UUID values might be useful. > */ > { /* end of list */ } > } > diff --git a/block/vhdx.c b/block/vhdx.c > index fedcf9f..7bdb456 100644 > --- a/block/vhdx.c > +++ b/block/vhdx.c > @@ -1909,6 +1909,11 @@ static QemuOptsList vhdx_create_opts = { > .type = QEMU_OPT_BOOL, > .help = "Force use of payload blocks of type 'ZERO'. > Non-standard." > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { NULL } > } > }; > diff --git a/block/vmdk.c b/block/vmdk.c > index 83dd6fe..94e1ff7 100644 > --- a/block/vmdk.c > +++ b/block/vmdk.c > @@ -1529,7 +1529,7 @@ static int coroutine_fn > vmdk_co_write_zeroes(BlockDriverState *bs, > > static int vmdk_create_extent(const char *filename, int64_t filesize, > bool flat, bool compress, bool zeroed_grain, > - Error **errp) > + QemuOpts *opts, Error **errp) > { > int ret, i; > BlockDriverState *bs = NULL; > @@ -1539,7 +1539,7 @@ static int vmdk_create_extent(const char *filename, > int64_t filesize, > uint32_t *gd_buf = NULL; > int gd_buf_size; > > - ret = bdrv_create_file(filename, NULL, &local_err); > + ret = bdrv_create_file(filename, opts, &local_err); > if (ret < 0) { > error_propagate(errp, local_err); > goto exit; > @@ -1845,7 +1845,7 @@ static int vmdk_create(const char *filename, QemuOpts > *opts, Error **errp) > path, desc_filename); > > if (vmdk_create_extent(ext_filename, size, > - flat, compress, zeroed_grain, errp)) { > + flat, compress, zeroed_grain, opts, errp)) { > ret = -EINVAL; > goto exit; > } > @@ -2153,6 +2153,11 @@ static QemuOptsList vmdk_create_opts = { > .help = "Enable efficient zero writes " > "using the zeroed-grain GTE feature" > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/block/vpc.c b/block/vpc.c > index 798d854..8b376a4 100644 > --- a/block/vpc.c > +++ b/block/vpc.c > @@ -29,6 +29,13 @@ > #if defined(CONFIG_UUID) > #include <uuid/uuid.h> > #endif > +#ifdef __linux__ > +#include <linux/fs.h> > +#include <sys/ioctl.h> > +#ifndef FS_NOCOW_FL > +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */ > +#endif > +#endif > > /**************************************************************/ > > @@ -751,6 +758,7 @@ static int vpc_create(const char *filename, QemuOpts > *opts, Error **errp) > int64_t total_size; > int disk_type; > int ret = -EIO; > + bool nocow = false; > > /* Read out options */ > total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0); > @@ -767,6 +775,7 @@ static int vpc_create(const char *filename, QemuOpts > *opts, Error **errp) > } else { > disk_type = VHD_DYNAMIC; > } > + nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); > > /* Create the file */ > fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, > 0644); > @@ -775,6 +784,21 @@ static int vpc_create(const char *filename, QemuOpts > *opts, Error **errp) > goto out; > } > > + if (nocow) { > +#ifdef __linux__ > + /* Set NOCOW flag to solve performance issue on fs like btrfs. > + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value > will > + * be ignored since any failure of this operation should not block > the > + * left work. > + */ > + int attr; > + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { > + attr |= FS_NOCOW_FL; > + ioctl(fd, FS_IOC_SETFLAGS, &attr); > + } > +#endif > + } > + > /* > * Calculate matching total_size and geometry. Increase the number of > * sectors requested until we get enough (or fail). This ensures that > @@ -884,6 +908,11 @@ static QemuOptsList vpc_create_opts = { > "Type of virtual hard disk format. Supported formats are " > "{dynamic (default) | fixed} " > }, > + { > + .name = BLOCK_OPT_NOCOW, > + .type = QEMU_OPT_BOOL, > + .help = "Turn off copy-on-write (valid only on btrfs)" > + }, > { /* end of list */ } > } > }; > diff --git a/include/block/block_int.h b/include/block/block_int.h > index 7aa2213..4e5022a 100644 > --- a/include/block/block_int.h > +++ b/include/block/block_int.h > @@ -54,6 +54,7 @@ > #define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts" > #define BLOCK_OPT_ADAPTER_TYPE "adapter_type" > #define BLOCK_OPT_REDUNDANCY "redundancy" > +#define BLOCK_OPT_NOCOW "nocow" > > typedef struct BdrvTrackedRequest { > BlockDriverState *bs; > diff --git a/qemu-doc.texi b/qemu-doc.texi > index 88ec9bb..ad92c85 100644 > --- a/qemu-doc.texi > +++ b/qemu-doc.texi > @@ -589,6 +589,22 @@ check -r all} is required, which may take some time. > > This option can only be enabled if @code{compat=1.1} is specified. > > +@item nocow > +If this option is set to @code{on}, it will trun off COW of the file. It's > only > +valid on btrfs, no effect on other file systems. > + > +Btrfs has low performance when hosting a VM image file, even more when the > guest > +on the VM also using btrfs as file system. Turning off COW is a way to > mitigate > +this bad performance. Generally there are two ways to turn off COW on > btrfs: > +a) Disable it by mounting with nodatacow, then all newly created files will > be > +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this > option > +does. > + > +Note: this option is only valid to new or empty files. If there is an > existing > +file which is COW and has data blocks already, it couldn't be changed to > NOCOW > +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check > if > +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). > + > @end table > > @item qed > diff --git a/qemu-img.texi b/qemu-img.texi > index c68b541..8496f3b 100644 > --- a/qemu-img.texi > +++ b/qemu-img.texi > @@ -474,6 +474,22 @@ check -r all} is required, which may take some time. > > This option can only be enabled if @code{compat=1.1} is specified. > > +@item nocow > +If this option is set to @code{on}, it will trun off COW of the file. It's > only > +valid on btrfs, no effect on other file systems. > + > +Btrfs has low performance when hosting a VM image file, even more when the > guest > +on the VM also using btrfs as file system. Turning off COW is a way to > mitigate > +this bad performance. Generally there are two ways to turn off COW on > btrfs: > +a) Disable it by mounting with nodatacow, then all newly created files will > be > +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this > option > +does. > + > +Note: this option is only valid to new or empty files. If there is an > existing > +file which is COW and has data blocks already, it couldn't be changed to > NOCOW > +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check > if > +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). > + > @end table > > @item Other