Re: [Qemu-block] [PATCH 2/2] qemu-iotests: Test commit with top-node/base-node
On 08/10/2018 11:26 AM, Kevin Wolf wrote: This adds some tests for block-commit with the new options top-node and base-node (taking node names) instead of top and base (taking file names). May need a rewrite if you like my idea of "by-node":true. Signed-off-by: Kevin Wolf --- tests/qemu-iotests/040 | 52 -- tests/qemu-iotests/040.out | 4 ++-- 2 files changed, 52 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040 index 1beb5e6dab..1cb1ceeb33 100755 --- a/tests/qemu-iotests/040 +++ b/tests/qemu-iotests/040 @@ -57,9 +57,12 @@ class ImageCommitTestCase(iotests.QMPTestCase): self.assert_no_active_block_jobs() self.vm.shutdown() -def run_commit_test(self, top, base, need_ready=False): +def run_commit_test(self, top, base, need_ready=False, node_names=False): Especially since you've already picked that style of signature here :) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATH] Fix build on CentOS 7
On 10/08/2018 16:11, Murilo Opsfelder Araujo wrote: > After commit b3f1c8c413bc83e4a2cc7a63e4eddf9fe6449052 "qemu-pr-helper: use new > libmultipath API", QEMU started using new libmultipath API, which is not > available on CentOS 7.x. > > This fixes that by probing the new libmultipath API in configure. If it > fails, > then try probing the old API. If it fails, then consider libmultipath not > available. > > With this, configure script defines CONFIG_MPATH_NEW_API that is used in > scsi/qemu-pr-helper.c to use the new libmultipath API. > > Fixes: b3f1c8c413bc83e4a2cc7a63e4eddf9fe6449052 > BugLink: https://bugs.launchpad.net/qemu/+bug/1786343 > Signed-off-by: Murilo Opsfelder Araujo Thanks, I'll look at the patch for 3.1. For what it's worth, RHEL7 has simply reverted that patch in its RPM. Paolo
Re: [Qemu-block] [PATCH 1/2] commit: Add top-node/base-node options
On 08/10/2018 11:26 AM, Kevin Wolf wrote: The block-commit QMP command required specifying the top and base nodes of the commit jobs using the file name of that node. While this works in simple cases (local files with absolute paths), the file names generated for more complicated setups can be hard to predict. This adds two new options top-node and base-node to the command, which allow specifying node names instead. They are mutually exclusive with the old options. Signed-off-by: Kevin Wolf --- qapi/block-core.json | 24 ++-- blockdev.c | 32 ++-- 2 files changed, 48 insertions(+), 8 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 5b9084a394..91dd075c84 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1455,12 +1455,23 @@ # # @device: the device name or node-name of a root node # -# @base: The file name of the backing image to write data into. -#If not specified, this is the deepest backing image. +# @base-node: The node name of the backing image to write data into. +# If not specified, this is the deepest backing image. +# (since: 2.10) I'd word this as (since 3.1)... # -# @top:The file name of the backing image within the image chain, -#which contains the topmost data to be committed down. If -#not specified, this is the active layer. +# @base: Same as @base-node, except that it is a file name rather than a node +#name. This must be the exact filename string that was used to open the +#node; other strings, even if addressing the same file, are not +#accepted (deprecated, use @base-node instead) ...and this as (since 2.10). When we finish the deprecation and remove @base, then we might consolidate the 'since' documentation at that time, but until then, I think listing the two separate releases gives users an idea of how far back they might have been using the deprecated code, and when the preferred form was introduced. +# +# @top-node: The node name of the backing image within the image chain +#which contains the topmost data to be committed down. If +#not specified, this is the active layer. (since: 2.10) +# +# @top: Same as @top-node, except that it is a file name rather than a node +# name. This must be the exact filename string that was used to open the +# node; other strings, even if addressing the same file, are not +# accepted (deprecated, use @base-node instead) Likewise. Actually, do we NEED new arguments? Can we just make @base and @top accept either an exact file name OR a node name? On the other hand, new arguments are introspectible, overloading the old argument to take two forms is not. So that doesn't help :( Or, here's an idea: Keep the name @base and @top, but add a new '*by-node':'bool' parameter, defaulting to false for now, but perhaps with a deprecation warning that we'll switch the default to true in one release and delete the parameter altogether in an even later release. When false, @base and @top are filenames, as before; when true, @base and @top are node names instead. Introspectible, nicer names in the long run, and avoids having to detect a user providing two mutually-exclusive options at once. +++ b/blockdev.c @@ -3308,7 +3308,9 @@ out: } void qmp_block_commit(bool has_job_id, const char *job_id, const char *device, + bool has_base_node, const char *base_node, bool has_base, const char *base, + bool has_top_node, const char *top_node, bool has_top, const char *top, bool has_backing_file, const char *backing_file, bool has_speed, int64_t speed, Getting to be a long signature. Should we use 'boxed':true in the QAPI file to make this easier to write? (Separate commit) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH v2 3/7] dirty-bitmap: add bdrv_dirty_bitmap_next_dirty_area
08.08.2018 15:49, Vladimir Sementsov-Ogievskiy wrote: The function alters bdrv_dirty_iter_next_area(), which is wrong and less efficient (see next commit for description). Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/dirty-bitmap.h | 3 +++ include/qemu/hbitmap.h | 15 +++ block/dirty-bitmap.c | 7 +++ util/hbitmap.c | 38 ++ 4 files changed, 63 insertions(+) diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h index 5dc146abf3..c02be67564 100644 --- a/include/block/dirty-bitmap.h +++ b/include/block/dirty-bitmap.h @@ -100,6 +100,9 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs, char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp); int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start, uint64_t bytes); +bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap, + uint64_t *offset, uint64_t end, + uint64_t *length); BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, Error **errp); diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h index cd091f6134..b1e897af28 100644 --- a/include/qemu/hbitmap.h +++ b/include/qemu/hbitmap.h @@ -306,6 +306,21 @@ unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi); */ int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, uint64_t count); +/* hbitmap_next_dirty_area: + * @hb: The HBitmap to operate on + * @offset: in-out parameter. + * in: the offset to start from + * out: (if area found) start of found area + * @end: end of requested region. (*@offset + *@length) will be <= @end + * @length: length of found area + * + * If dirty area found within [@offset, @end), returns true and sets @offset + * and @length appropriately. Otherwise returns true and leaves @offset and + * @length unchanged. + */ +bool hbitmap_next_dirty_area(const HBitmap *hb, uint64_t *offset, + uint64_t end, uint64_t *length); + /* hbitmap_create_meta: * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap. * The caller owns the created bitmap and must call hbitmap_free_meta(hb) to diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index a9ee814da7..c24aa0e229 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -791,6 +791,13 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t offset, return hbitmap_next_zero(bitmap->bitmap, offset, bytes); } +bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap, + uint64_t *offset, uint64_t end, + uint64_t *length) +{ +return hbitmap_next_dirty_area(bitmap->bitmap, offset, end, length); +} + void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src, Error **errp) { diff --git a/util/hbitmap.c b/util/hbitmap.c index 8dbcd80a3d..9de0d7bf2d 100644 --- a/util/hbitmap.c +++ b/util/hbitmap.c @@ -244,6 +244,44 @@ int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, uint64_t count) return res; } +bool hbitmap_next_dirty_area(const HBitmap *hb, uint64_t *offset, + uint64_t end, uint64_t *length) +{ +HBitmapIter hbi; +int64_t off1, off0; +uint32_t granularity = 1UL << hb->granularity; + +if (end == 0) { +end = hb->orig_size; +} + +hbitmap_iter_init(, hb, *offset << hb->granularity); "<< hb->granularity" should be dropped. Hmm, this patch needs tests. +off1 = hbitmap_iter_next(, true); + +if (off1 < 0 || off1 >= end) { +return false; +} + +if (off1 + granularity >= end) { +if (off1 > *offset) { +*offset = off1; +} +*length = end - *offset; +return true; +} + +off0 = hbitmap_next_zero(hb, off1 + granularity, end); +if (off0 < 0) { +off0 = end; +} + +if (off1 > *offset) { +*offset = off1; +} +*length = off0 - *offset; +return true; +} + bool hbitmap_empty(const HBitmap *hb) { return hb->count == 0; -- Best regards, Vladimir
[Qemu-block] [PATCH 2/2] qemu-iotests: Test commit with top-node/base-node
This adds some tests for block-commit with the new options top-node and base-node (taking node names) instead of top and base (taking file names). Signed-off-by: Kevin Wolf --- tests/qemu-iotests/040 | 52 -- tests/qemu-iotests/040.out | 4 ++-- 2 files changed, 52 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040 index 1beb5e6dab..1cb1ceeb33 100755 --- a/tests/qemu-iotests/040 +++ b/tests/qemu-iotests/040 @@ -57,9 +57,12 @@ class ImageCommitTestCase(iotests.QMPTestCase): self.assert_no_active_block_jobs() self.vm.shutdown() -def run_commit_test(self, top, base, need_ready=False): +def run_commit_test(self, top, base, need_ready=False, node_names=False): self.assert_no_active_block_jobs() -result = self.vm.qmp('block-commit', device='drive0', top=top, base=base) +if node_names: +result = self.vm.qmp('block-commit', device='drive0', top_node=top, base_node=base) +else: +result = self.vm.qmp('block-commit', device='drive0', top=top, base=base) self.assert_qmp(result, 'return', {}) self.wait_for_complete(need_ready) @@ -101,6 +104,11 @@ class TestSingleDrive(ImageCommitTestCase): self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img).find("verification failed")) self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed")) +def test_commit_node(self): +self.run_commit_test("mid", "base", node_names=True) +self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img).find("verification failed")) +self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed")) + def test_device_not_found(self): result = self.vm.qmp('block-commit', device='nonexistent', top='%s' % mid_img) self.assert_qmp(result, 'error/class', 'DeviceNotFound') @@ -123,6 +131,30 @@ class TestSingleDrive(ImageCommitTestCase): self.assert_qmp(result, 'error/class', 'GenericError') self.assert_qmp(result, 'error/desc', 'Base \'badfile\' not found') +def test_top_node_invalid(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top_node='badfile', base_node='base') +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', "Cannot find device= nor node_name=badfile") + +def test_base_node_invalid(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top_node='mid', base_node='badfile') +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', "Cannot find device= nor node_name=badfile") + +def test_top_path_and_node(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top_node='mid', base_node='base', top='%s' % mid_img) +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', "'top-node' and 'top' are mutually exclusive") + +def test_base_path_and_node(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top_node='mid', base_node='base', base='%s' % backing_img) +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', "'base-node' and 'base' are mutually exclusive") + def test_top_is_active(self): self.run_commit_test(test_img, backing_img, need_ready=True) self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img).find("verification failed")) @@ -139,6 +171,22 @@ class TestSingleDrive(ImageCommitTestCase): self.assert_qmp(result, 'error/class', 'GenericError') self.assert_qmp(result, 'error/desc', 'Base \'%s\' not found' % mid_img) +def test_top_and_base_node_reversed(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top_node='base', base_node='top') +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', "'top' is not in this backing file chain") + +def test_top_node_in_wrong_chain(self): +self.assert_no_active_block_jobs() + +result = self.vm.qmp('blockdev-add', driver='null-co', node_name='null') +self.assert_qmp(result, 'return', {}) + +result = self.vm.qmp('block-commit', device='drive0', top_node='null', base_node='base') +self.assert_qmp(result, 'error/class', 'GenericError') +self.assert_qmp(result, 'error/desc', "'null' is not in this backing file chain") +
[Qemu-block] [PATCH 0/2] commit: Add top-node/base-node options
Kevin Wolf (2): commit: Add top-node/base-node options qemu-iotests: Test commit with top-node/base-node qapi/block-core.json | 24 +++-- blockdev.c | 32 ++-- tests/qemu-iotests/040 | 52 -- tests/qemu-iotests/040.out | 4 ++-- 4 files changed, 100 insertions(+), 12 deletions(-) -- 2.13.6
[Qemu-block] [PATCH 1/2] commit: Add top-node/base-node options
The block-commit QMP command required specifying the top and base nodes of the commit jobs using the file name of that node. While this works in simple cases (local files with absolute paths), the file names generated for more complicated setups can be hard to predict. This adds two new options top-node and base-node to the command, which allow specifying node names instead. They are mutually exclusive with the old options. Signed-off-by: Kevin Wolf --- qapi/block-core.json | 24 ++-- blockdev.c | 32 ++-- 2 files changed, 48 insertions(+), 8 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 5b9084a394..91dd075c84 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1455,12 +1455,23 @@ # # @device: the device name or node-name of a root node # -# @base: The file name of the backing image to write data into. -#If not specified, this is the deepest backing image. +# @base-node: The node name of the backing image to write data into. +# If not specified, this is the deepest backing image. +# (since: 2.10) # -# @top:The file name of the backing image within the image chain, -#which contains the topmost data to be committed down. If -#not specified, this is the active layer. +# @base: Same as @base-node, except that it is a file name rather than a node +#name. This must be the exact filename string that was used to open the +#node; other strings, even if addressing the same file, are not +#accepted (deprecated, use @base-node instead) +# +# @top-node: The node name of the backing image within the image chain +#which contains the topmost data to be committed down. If +#not specified, this is the active layer. (since: 2.10) +# +# @top: Same as @top-node, except that it is a file name rather than a node +# name. This must be the exact filename string that was used to open the +# node; other strings, even if addressing the same file, are not +# accepted (deprecated, use @base-node instead) # # @backing-file: The backing file string to write into the overlay # image of 'top'. If 'top' is the active layer, @@ -1516,7 +1527,8 @@ # ## { 'command': 'block-commit', - 'data': { '*job-id': 'str', 'device': 'str', '*base': 'str', '*top': 'str', + 'data': { '*job-id': 'str', 'device': 'str', '*base-node': 'str', +'*base': 'str', '*top-node': 'str', '*top': 'str', '*backing-file': 'str', '*speed': 'int', '*filter-node-name': 'str' } } diff --git a/blockdev.c b/blockdev.c index dcf8c8d2ab..064c8fb3f5 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3308,7 +3308,9 @@ out: } void qmp_block_commit(bool has_job_id, const char *job_id, const char *device, + bool has_base_node, const char *base_node, bool has_base, const char *base, + bool has_top_node, const char *top_node, bool has_top, const char *top, bool has_backing_file, const char *backing_file, bool has_speed, int64_t speed, @@ -3360,7 +3362,20 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device, /* default top_bs is the active layer */ top_bs = bs; -if (has_top && top) { +if (has_top_node && has_top) { +error_setg(errp, "'top-node' and 'top' are mutually exclusive"); +goto out; +} else if (has_top_node) { +top_bs = bdrv_lookup_bs(NULL, top_node, errp); +if (top_bs == NULL) { +goto out; +} +if (!bdrv_chain_contains(bs, top_bs)) { +error_setg(errp, "'%s' is not in this backing file chain", + top_node); +goto out; +} +} else if (has_top && top) { if (strcmp(bs->filename, top) != 0) { top_bs = bdrv_find_backing_image(bs, top); } @@ -3373,7 +3388,20 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device, assert(bdrv_get_aio_context(top_bs) == aio_context); -if (has_base && base) { +if (has_base_node && has_base) { +error_setg(errp, "'base-node' and 'base' are mutually exclusive"); +goto out; +} else if (has_base_node) { +base_bs = bdrv_lookup_bs(NULL, base_node, errp); +if (base_bs == NULL) { +goto out; +} +if (!bdrv_chain_contains(top_bs, base_bs)) { +error_setg(errp, "'%s' is not in this backing file chain", + base_node); +goto out; +} +} else if (has_base && base) { base_bs = bdrv_find_backing_image(top_bs, base); } else { base_bs = bdrv_find_base(top_bs); -- 2.13.6
[Qemu-block] [PATH] Fix build on CentOS 7
After commit b3f1c8c413bc83e4a2cc7a63e4eddf9fe6449052 "qemu-pr-helper: use new libmultipath API", QEMU started using new libmultipath API, which is not available on CentOS 7.x. This fixes that by probing the new libmultipath API in configure. If it fails, then try probing the old API. If it fails, then consider libmultipath not available. With this, configure script defines CONFIG_MPATH_NEW_API that is used in scsi/qemu-pr-helper.c to use the new libmultipath API. Fixes: b3f1c8c413bc83e4a2cc7a63e4eddf9fe6449052 BugLink: https://bugs.launchpad.net/qemu/+bug/1786343 Signed-off-by: Murilo Opsfelder Araujo --- configure | 24 +++- scsi/qemu-pr-helper.c | 4 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 2a7796ea80..bc6000c24a 100755 --- a/configure +++ b/configure @@ -3558,6 +3558,7 @@ fi # libmpathpersist probe if test "$mpath" != "no" ; then + # probe for the new API cat > $TMPC < #include @@ -3579,8 +3580,26 @@ int main(void) { EOF if compile_prog "" "-ludev -lmultipath -lmpathpersist" ; then mpathpersist=yes +mpathpersist_new_api=yes else -mpathpersist=no +# probe for the old API +cat > $TMPC < +#include +unsigned mpath_mx_alloc_len = 1024; +int logsink; +int main(void) { +struct udev *udev = udev_new(); +mpath_lib_init(udev); +return 0; +} +EOF +if compile_prog "" "-ludev -lmultipath -lmpathpersist" ; then + mpathpersist=yes + mpathpersist_new_api=no +else + mpathpersist=no +fi fi else mpathpersist=no @@ -6421,6 +6440,9 @@ if test "$virtfs" = "yes" ; then fi if test "$mpath" = "yes" ; then echo "CONFIG_MPATH=y" >> $config_host_mak + if test "$mpathpersist_new_api" = "yes"; then +echo "CONFIG_MPATH_NEW_API=y" >> $config_host_mak + fi fi if test "$vhost_scsi" = "yes" ; then echo "CONFIG_VHOST_SCSI=y" >> $config_host_mak diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c index 1528a712a0..ed037aabee 100644 --- a/scsi/qemu-pr-helper.c +++ b/scsi/qemu-pr-helper.c @@ -301,7 +301,11 @@ void put_multipath_config(struct config *conf) static void multipath_pr_init(void) { udev = udev_new(); +#ifdef CONFIG_MPATH_NEW_API multipath_conf = mpath_lib_init(); +#else +mpath_lib_init(udev); +#endif } static int is_mpath(int fd) -- 2.17.1
Re: [Qemu-block] [PATCH v7 5/9] qcow2: Assign the L2 cache relatively to the image size
On Fri 10 Aug 2018 08:26:43 AM CEST, Leonid Bloch wrote: > Previously, the L2 cache was allocated without considering the image > size, and an option existed to manually determine its size. This is not quite correct: the L2 cache was set to the maximum needed for the image when "cache-size" was large enough: if (combined_cache_size >= max_l2_cache + min_refcount_cache) { *l2_cache_size = max_l2_cache; } ... See below for an example. > } else { > -uint64_t virtual_disk_size = bs->total_sectors * > BDRV_SECTOR_SIZE; > -uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / > 8); > - > /* Assign as much memory as possible to the L2 cache, and > * use the remainder for the refcount cache */ > -if (combined_cache_size >= max_l2_cache + min_refcount_cache) { > -*l2_cache_size = max_l2_cache; > +if (combined_cache_size >= *l2_cache_size + min_refcount_cache) { This has an (unintended?) side effect: If you have a 1TB qcow2 image and open it with e.g. cache-size=200M, with the previous code you would get a 128MB L2 cache (max_l2_cache). With the new code you only get 32MB. The rest of the function looks good to me now. We're getting close :) > - - The default L2 cache size is 8 clusters or 1MB (whichever is more), > - and the minimum is 2 clusters (or 2 cache entries, see below). > + - The default L2 cache size will cover the entire virtual size of an > + image, up to a certain maximum. This maximum is 1 MB by default > + (enough for image sizes of up to 8 GB with the default cluster size) > + and it can be reduced or enlarged using the "l2-cache-size" option. > + The minimum is 2 clusters (or 2 cache entries, see below). It's not clear with this wording if this "certain maximum" refers to the cache size or the image size. I'm thinking that perhaps it's clearer if you leave that item unchanged and add a new one below, something like: - The L2 cache size setting (regardless of whether it takes the default value or it's set by the user) refers to the maximum size of the L2 cache. If QEMU needs less memory to hold all of the image's L2 tables, then that's the actual size of the cache that will be allocated. > - If the L2 cache is big enough to hold all of the image's L2 tables > - (as explained in the "Choosing the right cache sizes" section > - earlier in this document) then none of this is necessary and you > - can omit the "l2-cache-entry-size" parameter altogether. > + (the default behavior for images of up to 8 GB in size) then none > + of this is necessary and you can omit the "l2-cache-entry-size" > + parameter altogether. I think this change is unnecessary :-? > @item refcount-cache-size > The maximum size of the refcount block cache in bytes > diff --git a/tests/qemu-iotests/137 b/tests/qemu-iotests/137 > index 87965625d8..e3fb078588 100755 > --- a/tests/qemu-iotests/137 > +++ b/tests/qemu-iotests/137 > @@ -109,7 +109,6 @@ $QEMU_IO \ > -c "reopen -o cache-size=1M,l2-cache-size=64k,refcount-cache-size=64k" \ > -c "reopen -o cache-size=1M,l2-cache-size=2M" \ > -c "reopen -o cache-size=1M,refcount-cache-size=2M" \ > --c "reopen -o l2-cache-size=256T" \ The "L2 cache size too big" error can still be tested, but you will need to create an image large enough to allow such a big cache. $ qemu-img create -f qcow2 -o cluster_size=256k hd.qcow2 32P $ $QEMU -drive file=hd.qcow2,l2-cache-entry-size=512,l2-cache-size=1T Berto
Re: [Qemu-block] [PATCH v7 4/9] qcow2: Avoid duplication in setting the refcount cache size
On Fri 10 Aug 2018 08:26:42 AM CEST, Leonid Bloch wrote: > The refcount cache size does not need to be set to its minimum value in > read_cache_sizes(), as it is set to at least its minimum value in > qcow2_update_options_prepare(). > > Signed-off-by: Leonid Bloch > -} else { > -if (!l2_cache_size_set) { > -*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE, > - (uint64_t)DEFAULT_L2_CACHE_CLUSTERS > - * s->cluster_size); > -} > -if (!refcount_cache_size_set) { > -*refcount_cache_size = min_refcount_cache; > -} Since you're getting rid of the rest of the code later anyway, I think it's cleaner to only remove these last three lines here and leave the rest untouched. It makes the patch shorter and easier to read. > +/* If refcount-cache-size is not specified, it will be set to minimum > + * in qcow2_update_options_prepare(). No need to set it here. */ This is not quite correct, because apart from the "not specified" case, refcount_cache_size is also overridden in other circumstances: (a) the value is specified but is too low, or (b) the value is set indirectly via "cache-size" but the end result is too low[*]. Also, the same thing happens to l2-cache-size: if you set it manually but it's too low then it will be overridden. I'd personally remove the comment altogether, I think the explanation in the commit message is enough. Berto [*] Now that I think of it: if you set e.g. cache-size = 16M and l2-cache-size = 16MB then you'll end up with refcount-cache-size = 16M - 16M = 0, which will be overridden and set to the minimum. But then refcount-cache-size + l2-cache-size > cache-size So perhaps this should produce an error, and it may make sense to take the opportunity to move all the logic that ensures that MIN_SIZE <= size <= INT_MAX to read_cache_sizes(). But that's a possible task for the future and I wouldn't worry about it in this series.
Re: [Qemu-block] [PATCH v7 2/9] qcow2: Cosmetic changes
On Fri 10 Aug 2018 08:26:40 AM CEST, Leonid Bloch wrote: > Some refactoring for better readability is done here. > > Signed-off-by: Leonid Bloch > -*l2_cache_entry_size = qemu_opt_get_size( > -opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size); > +*l2_cache_entry_size = qemu_opt_get_size(opts, > + QCOW2_OPT_L2_CACHE_ENTRY_SIZE, > + s->cluster_size); > -*refcount_cache_size = > -MIN(combined_cache_size, min_refcount_cache); > +*refcount_cache_size = MIN(combined_cache_size, > + min_refcount_cache); I won't oppose these changes if more people think that they improve readability, but it seems to me that this is just a matter of taste and there's no big difference either way. I'm personally fine with the way the code is now, and this coding style is used in many other parts of QEMU: $ git grep -A 1 '.* = .*($' $ git grep -A 1 '.* =$' Berto
Re: [Qemu-block] [PATCH] file-posix: Skip effectiveless OFD lock operations
Am 18.07.2018 um 10:43 hat Fam Zheng geschrieben: > If we know we've already locked the bytes, don't do it again; similarly > don't unlock a byte if we haven't locked it. This doesn't change the > behavior, but fixes a corner case explained below. > > Libvirt had an error handling bug that an image can get its (ownership, > file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind > QEMU. Specifically, an image in use by Libvirt VM has: > > $ ls -lhZ b.img > -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 b.img > > Trying to attach it a second time won't work because of image locking. > And after the error, it becomes: > > $ ls -lhZ b.img > -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img > > Then, we won't be able to do OFD lock operations with the existing fd. > In other words, the code such as in blk_detach_dev: > > blk_set_perm(blk, 0, BLK_PERM_ALL, _abort); > > can abort() QEMU, out of environmental changes. > > This patch is an easy fix to this and the change is regardlessly > reasonable, so do it. > > Signed-off-by: Fam Zheng > --- > block/file-posix.c | 27 +-- > 1 file changed, 17 insertions(+), 10 deletions(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index 60af4b3d51..45d44c9947 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -680,23 +680,28 @@ typedef enum { > * file; if @unlock == true, also unlock the unneeded bytes. > * @shared_perm_lock_bits is the mask of all permissions that are NOT shared. > */ > -static int raw_apply_lock_bytes(int fd, > +static int raw_apply_lock_bytes(BDRVRawState *s, int fd, > uint64_t perm_lock_bits, > uint64_t shared_perm_lock_bits, > bool unlock, Error **errp) > { > int ret; > int i; > +uint64_t locked_perm, locked_shared_perm; > + > +locked_perm = s ? s->perm : 0; > +locked_shared_perm = s ? ~s->shared_perm & BLK_PERM_ALL : 0; For the s == NULL case, using 0 is okay for locking because we will always consider the bit as previously unlocked, so we will lock it. For unlocking, however, we'll also see it as previously unlocked, so we will never actually unlock anything any more. Am I missing something? Kevin
Re: [Qemu-block] [PATCH v7 6/9] qcow2: Increase the default upper limit on the L2 cache size
On Fri 10 Aug 2018 08:26:44 AM CEST, Leonid Bloch wrote: > The upper limit on the L2 cache size is increased from 1 MB to 32 MB. > This is done in order to allow default full coverage with the L2 cache > for images of up to 256 GB in size (was 8 GB). Note, that only the > needed amount to cover the full image is allocated. The value which is > changed here is just the upper limit on the L2 cache size, beyond which > it will not grow, even if the size of the image will require it to. > > Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia > -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) > +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) The patch looks perfect to me now and I'm fine with this change, but this is quite an increase from the previous default value. If anyone thinks that this is too aggressive (or too little :)) I'm all ears. Berto
Re: [Qemu-block] [PATCH v7 1/9] qcow2: Options' documentation fixes
On Fri 10 Aug 2018 08:26:39 AM CEST, Leonid Bloch wrote: > Signed-off-by: Leonid Bloch > --- > docs/qcow2-cache.txt | 16 +++- > qemu-options.hx | 9 ++--- > 2 files changed, 17 insertions(+), 8 deletions(-) > > diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt > index 8a09a5cc5f..0f157d859a 100644 > --- a/docs/qcow2-cache.txt > +++ b/docs/qcow2-cache.txt > @@ -97,12 +97,15 @@ need: > l2_cache_size = disk_size_GB * 131072 > refcount_cache_size = disk_size_GB * 32768 > > -QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount > -cache of 256KB (262144 bytes), so using the formulas we've just seen > -we have > +With the default cluster size, to cover each 8 GB of the virtual image > +size, 1MB of L2 cache is needed: > > - 1048576 / 131072 = 8 GB of virtual disk covered by that cache > -262144 / 32768 = 8 GB > + 65536 / 8 = 8192 = 8 GB / 1 MB Where does this 65536 / 8 come from? The line that you changed follows directly from the formula immediately before that paragraph: l2_cache_size = disk_size_GB * 131072 that is l2_cache_size / 131072 = disk_size_GB 1048576 / 131072 = 8 GB Berto
Re: [Qemu-block] [PATCH 19/21] qapi/block-commit: expose new job properties
Am 07.08.2018 um 16:52 hat Eric Blake geschrieben: > On 08/06/2018 11:33 PM, John Snow wrote: > > Signed-off-by: John Snow > > --- > > blockdev.c | 8 > > qapi/block-core.json | 3 ++- > > 2 files changed, 10 insertions(+), 1 deletion(-) > > > > > +++ b/qapi/block-core.json > > @@ -1518,7 +1518,8 @@ > > { 'command': 'block-commit', > > 'data': { '*job-id': 'str', 'device': 'str', '*base': 'str', '*top': > > 'str', > > '*backing-file': 'str', '*speed': 'int', > > -'*filter-node-name': 'str' } } > > +'*filter-node-name': 'str', > > +'*auto-finalize': 'bool', '*auto-dismiss': 'bool' } } > > Missing documentation, including a 'since 3.1' tag. And the same for the other two commands. Kevin
Re: [Qemu-block] [PATCH v7 3/9] qcow2: Make sizes more humanly readable
On Fri 10 Aug 2018 08:26:41 AM CEST, Leonid Bloch wrote: > Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [Qemu-devel] [PATCH 3/4] monitor: fix oob command leak
Marc-André Lureau writes: > Hi > > On Thu, Aug 9, 2018 at 1:44 PM, Marc-André Lureau > wrote: >> Spotted by ASAN, during make check... >> >> Direct leak of 40 byte(s) in 1 object(s) allocated from: >> #0 0x7f8e27262c48 in malloc (/lib64/libasan.so.5+0xeec48) >> #1 0x7f8e26a5f3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) >> #2 0x555ab67078a8 in qstring_from_str >> /home/elmarco/src/qq/qobject/qstring.c:67 >> #3 0x555ab67071e4 in qstring_new >> /home/elmarco/src/qq/qobject/qstring.c:24 >> #4 0x555ab6713fbf in qstring_from_escaped_str >> /home/elmarco/src/qq/qobject/json-parser.c:144 >> #5 0x555ab671738c in parse_literal >> /home/elmarco/src/qq/qobject/json-parser.c:506 >> #6 0x555ab67179c3 in parse_value >> /home/elmarco/src/qq/qobject/json-parser.c:569 >> #7 0x555ab6715123 in parse_pair >> /home/elmarco/src/qq/qobject/json-parser.c:306 >> #8 0x555ab6715483 in parse_object >> /home/elmarco/src/qq/qobject/json-parser.c:357 >> #9 0x555ab671798b in parse_value >> /home/elmarco/src/qq/qobject/json-parser.c:561 >> #10 0x555ab6717a6b in json_parser_parse_err >> /home/elmarco/src/qq/qobject/json-parser.c:592 >> #11 0x555ab4fd4dcf in handle_qmp_command >> /home/elmarco/src/qq/monitor.c:4257 >> #12 0x555ab6712c4d in json_message_process_token >> /home/elmarco/src/qq/qobject/json-streamer.c:105 >> #13 0x555ab67e01e2 in json_lexer_feed_char >> /home/elmarco/src/qq/qobject/json-lexer.c:323 >> #14 0x555ab67e0af6 in json_lexer_feed >> /home/elmarco/src/qq/qobject/json-lexer.c:373 >> #15 0x555ab6713010 in json_message_parser_feed >> /home/elmarco/src/qq/qobject/json-streamer.c:124 >> #16 0x555ab4fd58ec in monitor_qmp_read >> /home/elmarco/src/qq/monitor.c:4337 >> #17 0x555ab6559df2 in qemu_chr_be_write_impl >> /home/elmarco/src/qq/chardev/char.c:175 >> #18 0x555ab6559e95 in qemu_chr_be_write >> /home/elmarco/src/qq/chardev/char.c:187 >> #19 0x555ab6560127 in fd_chr_read >> /home/elmarco/src/qq/chardev/char-fd.c:66 >> #20 0x555ab65d9c73 in qio_channel_fd_source_dispatch >> /home/elmarco/src/qq/io/channel-watch.c:84 >> #21 0x7f8e26a598ac in g_main_context_dispatch >> (/lib64/libglib-2.0.so.0+0x4c8ac) >> >> Signed-off-by: Marc-André Lureau > > It looks like it was introduced with commit > b27314567d4cd8e204a18feba60d3341fb2d1aed "qmp: Simplify code around > monitor_qmp_dispatch_one()" Yes. Before that commit, handle_qmp_command() freed @req and @id as follows. * Early errors: goto err, where we free req, and leak @id (I think). * Out of band: store @req and @id in @req_obj, free it in monitor_qmp_dispatch_one(). * In band: store @req and @id in @req_obj - queue full: free by calling qmp_request_free() - else: enqueue, monitor_qmp_bh_dispatcher() will deqeueue and pass to monitor_qmp_dispatch_one(), which frees. Current head always stores @req and @id in @req_obj. It frees as follows. * Out of band: it doesn't, since monitor_qmp_dispatch(), renamed from monitor_qmp_dispatch_one(), doesn't free anymore. This patch fixes it. * In band: - queue full: free by calling qmp_request_free(). - else: enqueue, monitor_qmp_bh_dispatcher() will deqeueue and pass to monitor_qmp_dispatch(), then free with qmp_request_free(). Looks like the commit replaced one leak by another one. > >> --- >> monitor.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/monitor.c b/monitor.c >> index 77861e96af..a1999e396c 100644 >> --- a/monitor.c >> +++ b/monitor.c >> @@ -4277,6 +4277,8 @@ static void handle_qmp_command(JSONMessageParser >> *parser, GQueue *tokens) >> trace_monitor_qmp_cmd_out_of_band(qobject_get_try_str(id) >>?: ""); >> monitor_qmp_dispatch(mon, req, id); >> +qobject_unref(req); >> +qobject_unref(id); >> return; >> } >> >> -- >> 2.18.0.547.g1d89318c48 >> >> I'm not going to argue for inclusion in 3.0, because OOB is experimental and off by default due to its remaining flaws. Cc: qemu-sta...@nongnu.org Reviewed-by: Markus Armbruster
[Qemu-block] [PATCH v7 3/9] qcow2: Make sizes more humanly readable
Signed-off-by: Leonid Bloch --- block/qcow2.c | 2 +- block/qcow2.h | 9 + 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 3f4abc394e..7a2d7a1d48 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -831,7 +831,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, } } else { if (!l2_cache_size_set) { -*l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE, +*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE, (uint64_t)DEFAULT_L2_CACHE_CLUSTERS * s->cluster_size); } diff --git a/block/qcow2.h b/block/qcow2.h index 81b844e936..39e1b279f8 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -27,6 +27,7 @@ #include "crypto/block.h" #include "qemu/coroutine.h" +#include "qemu/units.h" //#define DEBUG_ALLOC //#define DEBUG_ALLOC2 @@ -43,11 +44,11 @@ /* 8 MB refcount table is enough for 2 PB images at 64k cluster size * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */ -#define QCOW_MAX_REFTABLE_SIZE 0x80 +#define QCOW_MAX_REFTABLE_SIZE (8 * MiB) /* 32 MB L1 table is enough for 2 PB images at 64k cluster size * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */ -#define QCOW_MAX_L1_SIZE 0x200 +#define QCOW_MAX_L1_SIZE (32 * MiB) /* Allow for an average of 1k per snapshot table entry, should be plenty of * space for snapshot names and IDs */ @@ -75,9 +76,9 @@ /* Whichever is more */ #define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */ -#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */ +#define DEFAULT_L2_CACHE_SIZE (1 * MiB) -#define DEFAULT_CLUSTER_SIZE 65536 +#define DEFAULT_CLUSTER_SIZE (64 * KiB) #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts" -- 2.17.1
[Qemu-block] [PATCH v7 8/9] qcow2: Set the default cache-clean-interval to 10 minutes
The default cache-clean-interval is set to 10 minutes, in order to lower the overhead of the qcow2 caches (before the default was 0, i.e. disabled). Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia --- block/qcow2.c| 2 +- block/qcow2.h| 1 + docs/qcow2-cache.txt | 4 ++-- qapi/block-core.json | 3 ++- qemu-options.hx | 2 +- 5 files changed, 7 insertions(+), 5 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index ba4dfae735..b4f291765b 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -944,7 +944,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs, /* New interval for cache cleanup timer */ r->cache_clean_interval = qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, -s->cache_clean_interval); +DEFAULT_CACHE_CLEAN_INTERVAL); #ifndef CONFIG_LINUX if (r->cache_clean_interval != 0) { error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL diff --git a/block/qcow2.h b/block/qcow2.h index e699a55d02..5e94f7ffc4 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -78,6 +78,7 @@ #define DEFAULT_CLUSTER_SIZE (64 * KiB) +#define DEFAULT_CACHE_CLEAN_INTERVAL 600 /* seconds */ #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts" #define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request" diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 6ad1081d1a..684147ad45 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -204,8 +204,8 @@ This example removes all unused cache entries every 15 minutes: -drive file=hd.qcow2,cache-clean-interval=900 -If unset, the default value for this parameter is 0 and it disables -this feature. +If unset, the default value for this parameter is 600. Setting it to 0 +disables this feature. Note that this functionality currently relies on the MADV_DONTNEED argument for madvise() to actually free the memory. This is a diff --git a/qapi/block-core.json b/qapi/block-core.json index 5b9084a394..9a6a708a37 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2830,7 +2830,8 @@ # # @cache-clean-interval: clean unused entries in the L2 and refcount # caches. The interval is in seconds. The default value -# is 0 and it disables this feature (since 2.5) +# is 600, and 0 disables this feature. (since 2.5) +# # @encrypt: Image decryption options. Mandatory for # encrypted images, except when doing a metadata-only # probe of the image. (since 2.10) diff --git a/qemu-options.hx b/qemu-options.hx index 4c44cdbc23..6abf3631ec 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -767,7 +767,7 @@ it which is not used for the L2 cache) @item cache-clean-interval Clean unused entries in the L2 and refcount caches. The interval is in seconds. -The default value is 0 and it disables this feature. +The default value is 600. Setting it to 0 disables this feature. @item pass-discard-request Whether discard requests to the qcow2 device should be forwarded to the data -- 2.17.1
[Qemu-block] [PATCH v7 6/9] qcow2: Increase the default upper limit on the L2 cache size
The upper limit on the L2 cache size is increased from 1 MB to 32 MB. This is done in order to allow default full coverage with the L2 cache for images of up to 256 GB in size (was 8 GB). Note, that only the needed amount to cover the full image is allocated. The value which is changed here is just the upper limit on the L2 cache size, beyond which it will not grow, even if the size of the image will require it to. Signed-off-by: Leonid Bloch --- block/qcow2.h| 2 +- docs/qcow2-cache.txt | 6 +++--- qemu-options.hx | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/block/qcow2.h b/block/qcow2.h index d917b5f577..e699a55d02 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -74,7 +74,7 @@ /* Must be at least 4 to cover all cases of refcount table growth */ #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */ -#define DEFAULT_L2_CACHE_MAX_SIZE (1 * MiB) +#define DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB) #define DEFAULT_CLUSTER_SIZE (64 * KiB) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 69af306267..6ad1081d1a 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -125,8 +125,8 @@ There are a few things that need to be taken into account: (or the cache entry size: see "Using smaller cache sizes" below). - The default L2 cache size will cover the entire virtual size of an - image, up to a certain maximum. This maximum is 1 MB by default - (enough for image sizes of up to 8 GB with the default cluster size) + image, up to a certain maximum. This maximum is 32 MB by default + (enough for image sizes of up to 256 GB with the default cluster size) and it can be reduced or enlarged using the "l2-cache-size" option. The minimum is 2 clusters (or 2 cache entries, see below). @@ -186,7 +186,7 @@ Some things to take into account: always uses the cluster size as the entry size. - If the L2 cache is big enough to hold all of the image's L2 tables - (the default behavior for images of up to 8 GB in size) then none + (the default behavior for images of up to 256 GB in size) then none of this is necessary and you can omit the "l2-cache-entry-size" parameter altogether. diff --git a/qemu-options.hx b/qemu-options.hx index 22e8e2d113..4c44cdbc23 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -756,7 +756,7 @@ The maximum total size of the L2 table and refcount block caches in bytes @item l2-cache-size The maximum size of the L2 table cache in bytes -(default: if cache-size is not specified - 1M; otherwise, as large as possible +(default: if cache-size is not specified - 32M; otherwise, as large as possible within the cache-size, while permitting the requested or the minimal refcount cache size) -- 2.17.1
[Qemu-block] [PATCH v7 4/9] qcow2: Avoid duplication in setting the refcount cache size
The refcount cache size does not need to be set to its minimum value in read_cache_sizes(), as it is set to at least its minimum value in qcow2_update_options_prepare(). Signed-off-by: Leonid Bloch --- block/qcow2.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 7a2d7a1d48..053242f94e 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -829,16 +829,13 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, *l2_cache_size = combined_cache_size - *refcount_cache_size; } } -} else { -if (!l2_cache_size_set) { -*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE, - (uint64_t)DEFAULT_L2_CACHE_CLUSTERS - * s->cluster_size); -} -if (!refcount_cache_size_set) { -*refcount_cache_size = min_refcount_cache; -} +} else if (!l2_cache_size_set) { +*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE, + (uint64_t)DEFAULT_L2_CACHE_CLUSTERS + * s->cluster_size); } +/* If refcount-cache-size is not specified, it will be set to minimum + * in qcow2_update_options_prepare(). No need to set it here. */ if (*l2_cache_entry_size < (1 << MIN_CLUSTER_BITS) || *l2_cache_entry_size > s->cluster_size || -- 2.17.1
[Qemu-block] [PATCH v7 0/9] Take the image size into account when allocating the L2 cache
This series makes the qcow2 L2 cache assignment aware of the image size, with the intention for it to cover the entire image. The importance of this change is in noticeable performance improvement, especially with heavy random I/O. The memory overhead is not big in most cases, as only 1 MB of cache for every 8 GB of image size is used. For cases with very large images and/or small cluster sizes, or systems with limited RAM resources, there is an upper limit on the default L2 cache: 32 MB. To modify this limit one can use the already existing 'l2-cache-size' and 'cache-size' options. Moreover, this fixes the behavior of 'l2-cache-size', as it was documented as the *maximum* L2 cache size, but in practice behaved as the absolute size. To compensate the memory overhead which may be increased following this behavior, the default cache-clean-interval is set to 10 minutes by default (was disabled by default before). The L2 cache is also resized accordingly, by default, if the image is resized. Additionally, few minor changes are made (refactoring and documentation fixes). Differences from v1: * .gitignore modification patch removed (unneeded). * The grammar fix in conflicting cache sizing patch removed (merged). * The update total_sectors when resizing patch squashed with the resizing patch. * L2 cache is now capped at 32 MB. * The default cache-clean-interval is set to 30 seconds. Differences from v2: * Made it clear in the documentation that setting cache-clean-interval to 0 disables this feature. Differences from v3: * The default cache-clean-interval is set to 10 minutes instead of 30 seconds before. * Commit message changes to better explain the patches. * Some refactoring. Differences from v4: * Refactoring. * Documentation and commit message fixes. Differences from v5: * A patch with cosmetic changes is split from the main patch * A patch for avoiding duplication in refcount cache size assignment is split from the main patch. * In the main patch the cap on the L2 cache size is set to only 1 MB (in order to be close to the previous behavior) and a separate patch sets it to 32 MB. * Changes in the documentation fixes patch [1/8]. Differences from v6: * A patch is added to make the defined sizes more humanly readable Leonid Bloch (9): qcow2: Options' documentation fixes qcow2: Cosmetic changes qcow2: Make sizes more humanly readable qcow2: Avoid duplication in setting the refcount cache size qcow2: Assign the L2 cache relatively to the image size qcow2: Increase the default upper limit on the L2 cache size qcow2: Resize the cache upon image resizing qcow2: Set the default cache-clean-interval to 10 minutes qcow2: Explicit number replaced by a constant block/qcow2.c | 54 +- block/qcow2.h | 12 - docs/qcow2-cache.txt | 33 ++- qapi/block-core.json | 3 ++- qemu-options.hx| 11 +--- tests/qemu-iotests/137 | 1 - tests/qemu-iotests/137.out | 1 - 7 files changed, 66 insertions(+), 49 deletions(-) -- 2.17.1
[Qemu-block] [PATCH v7 2/9] qcow2: Cosmetic changes
Some refactoring for better readability is done here. Signed-off-by: Leonid Bloch --- block/qcow2.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index ec9e6238a0..3f4abc394e 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -790,8 +790,9 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, *refcount_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0); -*l2_cache_entry_size = qemu_opt_get_size( -opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size); +*l2_cache_entry_size = qemu_opt_get_size(opts, + QCOW2_OPT_L2_CACHE_ENTRY_SIZE, + s->cluster_size); if (combined_cache_size_set) { if (l2_cache_size_set && refcount_cache_size_set) { @@ -823,8 +824,8 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, *l2_cache_size = max_l2_cache; *refcount_cache_size = combined_cache_size - *l2_cache_size; } else { -*refcount_cache_size = -MIN(combined_cache_size, min_refcount_cache); +*refcount_cache_size = MIN(combined_cache_size, + min_refcount_cache); *l2_cache_size = combined_cache_size - *refcount_cache_size; } } -- 2.17.1
[Qemu-block] [PATCH v7 5/9] qcow2: Assign the L2 cache relatively to the image size
Sufficient L2 cache can noticeably improve the performance when using large images with frequent I/O. Previously, the L2 cache was allocated without considering the image size, and an option existed to manually determine its size. Thus to achieve a full coverage of an image by the L2 cache (i.e. use more than the default value of MAX(1 MB, 8 clusters)), a user needed to calculate the required size manually, or with a script, and pass this value to the 'l2-cache-size' option. Now, the L2 cache is assigned taking the virtual image size into account, and will cover the entire image, unless the size needed for that is larger than a certain maximum. This maximum is set to 1 MB by default (enough to cover an 8 GB image with the default cluster size) but can be increased or decreased using the 'l2-cache-size' option. This option was previously documented as the *maximum* L2 cache size, and this patch makes it behave as such, instead of as a constant size. Also, the existing option 'cache-size' can limit the sum of both L2 and refcount caches, as previously. Signed-off-by: Leonid Bloch --- block/qcow2.c | 22 ++ block/qcow2.h | 4 +--- docs/qcow2-cache.txt | 13 - qemu-options.hx| 6 +++--- tests/qemu-iotests/137 | 1 - tests/qemu-iotests/137.out | 1 - 6 files changed, 22 insertions(+), 25 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 053242f94e..434fb89076 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -777,16 +777,19 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, uint64_t *refcount_cache_size, Error **errp) { BDRVQcow2State *s = bs->opaque; -uint64_t combined_cache_size; +uint64_t combined_cache_size, l2_cache_max_setting; bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set; int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size; +uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE; +uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE); l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE); refcount_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE); combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0); -*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0); +l2_cache_max_setting = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, + DEFAULT_L2_CACHE_MAX_SIZE); *refcount_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0); @@ -794,13 +797,16 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size); +*l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting); + if (combined_cache_size_set) { if (l2_cache_size_set && refcount_cache_size_set) { error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set " "at the same time"); return; -} else if (*l2_cache_size > combined_cache_size) { +} else if (l2_cache_size_set && + (l2_cache_max_setting > combined_cache_size)) { error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed " QCOW2_OPT_CACHE_SIZE); return; @@ -815,13 +821,9 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, } else if (refcount_cache_size_set) { *l2_cache_size = combined_cache_size - *refcount_cache_size; } else { -uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE; -uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); - /* Assign as much memory as possible to the L2 cache, and * use the remainder for the refcount cache */ -if (combined_cache_size >= max_l2_cache + min_refcount_cache) { -*l2_cache_size = max_l2_cache; +if (combined_cache_size >= *l2_cache_size + min_refcount_cache) { *refcount_cache_size = combined_cache_size - *l2_cache_size; } else { *refcount_cache_size = MIN(combined_cache_size, @@ -829,10 +831,6 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, *l2_cache_size = combined_cache_size - *refcount_cache_size; } } -} else if (!l2_cache_size_set) { -*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE, - (uint64_t)DEFAULT_L2_CACHE_CLUSTERS - *
[Qemu-block] [PATCH v7 1/9] qcow2: Options' documentation fixes
Signed-off-by: Leonid Bloch --- docs/qcow2-cache.txt | 16 +++- qemu-options.hx | 9 ++--- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 8a09a5cc5f..0f157d859a 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -97,12 +97,15 @@ need: l2_cache_size = disk_size_GB * 131072 refcount_cache_size = disk_size_GB * 32768 -QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount -cache of 256KB (262144 bytes), so using the formulas we've just seen -we have +With the default cluster size, to cover each 8 GB of the virtual image +size, 1MB of L2 cache is needed: - 1048576 / 131072 = 8 GB of virtual disk covered by that cache -262144 / 32768 = 8 GB + 65536 / 8 = 8192 = 8 GB / 1 MB + +A default refcount cache is 4 times the cluster size, which defaults to +256 KB (262144 bytes). This is sufficient for 8 GB of image size: + + 262144 / 32768 = 8 GB How to configure the cache sizes @@ -130,6 +133,9 @@ There are a few things that need to be taken into account: memory as possible to the L2 cache before increasing the refcount cache size. + - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size" + can be set simultaneously. + Unlike L2 tables, refcount blocks are not used during normal I/O but only during allocations and internal snapshots. In most cases they are accessed sequentially (even during random guest I/O) so increasing the diff --git a/qemu-options.hx b/qemu-options.hx index b1bf0f485f..f6804758d3 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -752,15 +752,18 @@ image file) @item cache-size The maximum total size of the L2 table and refcount block caches in bytes -(default: 1048576 bytes or 8 clusters, whichever is larger) +(default: the sum of l2-cache-size and refcount-cache-size) @item l2-cache-size The maximum size of the L2 table cache in bytes -(default: 4/5 of the total cache size) +(default: if cache-size is not defined - 1048576 bytes or 8 clusters, whichever +is larger; otherwise, as large as possible or needed within the cache-size, +while permitting the requested or the minimal refcount cache size) @item refcount-cache-size The maximum size of the refcount block cache in bytes -(default: 1/5 of the total cache size) +(default: 4 times the cluster size; or if cache-size is specified, the part of +it which is not used for the L2 cache) @item cache-clean-interval Clean unused entries in the L2 and refcount caches. The interval is in seconds. -- 2.17.1
Re: [Qemu-block] [Qemu-devel] [PATCH 2/4] tests: fix bdrv-drain leak
Marc-André Lureau writes: > Spotted by ASAN: > > = > ==5378==ERROR: LeakSanitizer: detected memory leaks > > Direct leak of 65536 byte(s) in 1 object(s) allocated from: > #0 0x7f788f83bc48 in malloc (/lib64/libasan.so.5+0xeec48) > #1 0x7f788c9923c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) > #2 0x5622a1fe37bc in coroutine_trampoline > /home/elmarco/src/qq/util/coroutine-ucontext.c:116 > #3 0x7f788a15d75f in __correctly_grouped_prefixwc > (/lib64/libc.so.6+0x4c75f) > > Signed-off-by: Marc-André Lureau > --- > tests/test-bdrv-drain.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c > index 17bb8508ae..abc8bbe6f0 100644 > --- a/tests/test-bdrv-drain.c > +++ b/tests/test-bdrv-drain.c > @@ -948,6 +948,7 @@ static void coroutine_fn test_co_delete_by_drain(void > *opaque) > } > > dbdd->done = true; > +g_free(buffer); > } > > /** Screwed up in commit 4c8158e359d. Max, could the coroutine's stack accomodate the buffer?
[Qemu-block] [PATCH v7 9/9] qcow2: Explicit number replaced by a constant
Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia --- block/qcow2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index b4f291765b..b0e20aeffc 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1324,7 +1324,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, /* 2^(s->refcount_order - 3) is the refcount width in bytes */ s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3); s->refcount_block_size = 1 << s->refcount_block_bits; -bs->total_sectors = header.size / 512; +bs->total_sectors = header.size / BDRV_SECTOR_SIZE; s->csize_shift = (62 - (s->cluster_bits - 8)); s->csize_mask = (1 << (s->cluster_bits - 8)) - 1; s->cluster_offset_mask = (1LL << s->csize_shift) - 1; @@ -3450,7 +3450,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, goto fail; } -old_length = bs->total_sectors * 512; +old_length = bs->total_sectors * BDRV_SECTOR_SIZE; new_l1_size = size_to_l1(s, offset); if (offset < old_length) { -- 2.17.1
[Qemu-block] [PATCH v7 7/9] qcow2: Resize the cache upon image resizing
The caches are now recalculated upon image resizing. This is done because the new default behavior of assigning L2 cache relatively to the image size, implies that the cache will be adapted accordingly after an image resize. Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia --- block/qcow2.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 434fb89076..ba4dfae735 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3418,6 +3418,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, uint64_t old_length; int64_t new_l1_size; int ret; +QDict *options; if (prealloc != PREALLOC_MODE_OFF && prealloc != PREALLOC_MODE_METADATA && prealloc != PREALLOC_MODE_FALLOC && prealloc != PREALLOC_MODE_FULL) @@ -3642,6 +3643,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, } } +bs->total_sectors = offset / BDRV_SECTOR_SIZE; + /* write updated header.size */ offset = cpu_to_be64(offset); ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size), @@ -3652,6 +3655,13 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, } s->l1_vm_state_index = new_l1_size; + +/* Update cache sizes */ +options = qdict_clone_shallow(bs->options); +ret = qcow2_update_options(bs, options, s->flags, errp); +if (ret < 0) { +goto fail; +} ret = 0; fail: qemu_co_mutex_unlock(>lock); -- 2.17.1
Re: [Qemu-block] [Qemu-devel] [PATCH 1/4] tests: fix crumple/recursive leak
Marc-André Lureau writes: > Spotted by ASAN: > > = > ==27907==ERROR: LeakSanitizer: detected memory leaks > > Direct leak of 4120 byte(s) in 1 object(s) allocated from: > #0 0x7f913458ce50 in calloc (/lib64/libasan.so.5+0xeee50) > #1 0x7f9133fd641d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) > #2 0x5561c6643c95 in qdict_crumple_test_recursive > /home/elmarco/src/qq/tests/check-block-qdict.c:438 > #3 0x7f9133ff7c49 (/lib64/libglib-2.0.so.0+0x73c49) > > Signed-off-by: Marc-André Lureau > --- > tests/check-block-qdict.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tests/check-block-qdict.c b/tests/check-block-qdict.c > index 478807f839..73d3e9f574 100644 > --- a/tests/check-block-qdict.c > +++ b/tests/check-block-qdict.c > @@ -491,6 +491,7 @@ static void qdict_crumple_test_recursive(void) > empty_list_0 = qobject_to(QDict, qlist_pop(empty_list)); > g_assert(empty_list_0); > g_assert_cmpint(qdict_size(empty_list_0), ==, 0); > +qobject_unref(empty_list_0); > > qobject_unref(src); > qobject_unref(dst); Screwed up in commit 2860b2b2cb8. I can add that to the commit message when I apply. Reviewed-by: Markus Armbruster