Re: [Qemu-block] [Qemu-devel] [PATCH 23/24] util/cutils: Change qemu_strtosz*() from int64_t to uint64_t
On 02/14/2017 04:26 AM, Markus Armbruster wrote: > This will permit its use in parse_option_size(). Progress! (not that it matters - off_t is a signed value, so you are STILL limited to 2^63, not 2^64, as the maximum theoretical size storage that you can target - and it's not like we have that much storage even accessible) > > Cc: Dr. David Alan Gilbert> Cc: Eduardo Habkost (maintainer:X86) > Cc: Kevin Wolf (supporter:Block layer core) > Cc: Max Reitz (supporter:Block layer core) > Cc: qemu-block@nongnu.org (open list:Block layer core) > Signed-off-by: Markus Armbruster > --- > +++ b/hmp.c > @@ -1338,7 +1338,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const > QDict *qdict) > { > const char *param = qdict_get_str(qdict, "parameter"); > const char *valuestr = qdict_get_str(qdict, "value"); > -int64_t valuebw = 0; > +uint64_t valuebw = 0; > long valueint = 0; > Error *err = NULL; > bool use_int_value = false; > @@ -1379,7 +1379,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const > QDict *qdict) > case MIGRATION_PARAMETER_MAX_BANDWIDTH: > p.has_max_bandwidth = true; > ret = qemu_strtosz_mebi(valuestr, NULL, ); > -if (ret < 0 || (size_t)valuebw != valuebw) { > +if (ret < 0 || valuebw > INT64_MAX > +|| (size_t)valuebw != valuebw) { I don't know if this looks any better as: if (ret < 0 || valuebw > MIN(INT64_MAX, SIZE_MAX)) so don't bother changing it if you think that's ugly. > +++ b/qapi/opts-visitor.c > @@ -481,7 +481,6 @@ opts_type_size(Visitor *v, const char *name, uint64_t > *obj, Error **errp) > { > OptsVisitor *ov = to_ov(v); > const QemuOpt *opt; > -int64_t val; > int err; > > opt = lookup_scalar(ov, name, errp); > @@ -489,14 +488,13 @@ opts_type_size(Visitor *v, const char *name, uint64_t > *obj, Error **errp) > return; > } > > -err = qemu_strtosz(opt->str ? opt->str : "", NULL, ); > +err = qemu_strtosz(opt->str ? opt->str : "", NULL, obj); > if (err < 0) { > error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name, > - "a size value representible as a non-negative int64"); Nice - you're killing the unusual variant spelling of representable. > +++ b/util/cutils.c > @@ -207,7 +207,7 @@ static int64_t suffix_mul(char suffix, int64_t unit) > */ > static int do_strtosz(const char *nptr, char **end, >const char default_suffix, int64_t unit, > - int64_t *result) > + uint64_t *result) > { > int retval; > char *endptr; > @@ -237,7 +237,11 @@ static int do_strtosz(const char *nptr, char **end, > retval = -EINVAL; > goto out; > } > -if ((val * mul >= INT64_MAX) || val < 0) { > +/* > + * Values >= 0xfc00 overflow uint64_t after their trip > + * through double (53 bits of precision). > + */ > +if ((val * mul >= 0xfc00) || val < 0) { I guess there's not any simpler expression using INT64_MAX and operations (including casts to double and int64_t) that would reliably produce the same constant that you just open-coded here. Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH 22/24] util/cutils: Return qemu_strtosz*() error and value separately
On 02/14/2017 04:26 AM, Markus Armbruster wrote: > This makes qemu_strtosz(), qemu_strtosz_mebi() and > qemu_strtosz_metric() similar to qemu_strtoi64(), except negative > values are rejected. Yay. It also opens the door to allowing them to return an unsigned 64 bit value ;) > > Cc: Dr. David Alan Gilbert> Cc: Eduardo Habkost (maintainer:X86) > Cc: Kevin Wolf (supporter:Block layer core) > Cc: Max Reitz (supporter:Block layer core) > Cc: qemu-block@nongnu.org (open list:Block layer core) > Signed-off-by: Markus Armbruster > --- > @@ -1378,8 +1378,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const > QDict *qdict) > break; > case MIGRATION_PARAMETER_MAX_BANDWIDTH: > p.has_max_bandwidth = true; > -valuebw = qemu_strtosz_mebi(valuestr, NULL); > -if (valuebw < 0 || (size_t)valuebw != valuebw) { > +ret = qemu_strtosz_mebi(valuestr, NULL, ); > +if (ret < 0 || (size_t)valuebw != valuebw) { Question: do we want a wrapper that takes a maximum, as in: ret = qemu_strtosz_capped(valuestr, NULL, 'M', SIZE_MAX, ); so that the caller doesn't have to check (size_t)valuebw != valuebw ? But not essential, so I can live with what you did here. > +++ b/qemu-img.c > @@ -370,10 +370,14 @@ static int add_old_style_options(const char *fmt, > QemuOpts *opts, > > static int64_t cvtnum(const char *s) > +++ b/qemu-io-cmds.c > @@ -137,10 +137,14 @@ static char **breakline(char *input, int *count) > > static int64_t cvtnum(const char *s) May be some fallout based on rebase churn from earlier patch reviews, but nothing too serious to prevent you from adding: Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev
On Fri, Feb 17, 2017 at 03:42:52PM -0600, Eric Blake wrote: > On 02/17/2017 03:37 PM, Jeff Cody wrote: > > On Fri, Feb 17, 2017 at 05:39:01PM +0100, Peter Lieven wrote: > >> if the passed qiov contains exactly one iov we can > >> pass the buffer directly. > >> > >> Signed-off-by: Peter Lieven> >> --- > >> block/nfs.c | 23 --- > >> 1 file changed, 16 insertions(+), 7 deletions(-) > >> > >> diff --git a/block/nfs.c b/block/nfs.c > >> index ab5dcc2..bb4b75f 100644 > >> --- a/block/nfs.c > >> +++ b/block/nfs.c > >> @@ -295,20 +295,27 @@ static int coroutine_fn > >> nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, > >> NFSClient *client = bs->opaque; > >> NFSRPC task; > >> char *buf = NULL; > >> +bool my_buffer = false; > > > > g_free() is a nop if buf is NULL, so there is no need for the my_buffer > > variable & check. > > Umm, yes there is: > > >> +if (iov->niov != 1) { > >> +buf = g_try_malloc(bytes); > >> +if (bytes && buf == NULL) { > >> +return -ENOMEM; > >> +} > >> +qemu_iovec_to_buf(iov, 0, buf, bytes); > >> +my_buffer = true; > >> +} else { > >> +buf = iov->iov[0].iov_base; > > If we took the else branch, then we definitely do not want to be calling > g_free(buf). Doh! > > >> } > >> > >> -qemu_iovec_to_buf(iov, 0, buf, bytes); > >> - > >> if (nfs_pwrite_async(client->context, client->fh, > >> offset, bytes, buf, > >> nfs_co_generic_cb, ) != 0) { > >> -g_free(buf); > >> +if (my_buffer) { > >> +g_free(buf); > >> +} > > It looks correct as-is to me. Indeed. Reviewed-by: Jeff Cody
Re: [Qemu-block] [Qemu-devel] [PATCH v2 6/7] iscsi: Add blockdev-add support
On Fri, Feb 17, 2017 at 03:40:21PM -0600, Eric Blake wrote: > On 01/25/2017 11:42 AM, Jeff Cody wrote: > > From: Kevin Wolf> > > > This adds blockdev-add support for iscsi devices. > > > > Reviewed-by: Daniel P. Berrange > > Signed-off-by: Kevin Wolf > > Signed-off-by: Jeff Cody > > --- > > block/iscsi.c| 14 ++ > > qapi/block-core.json | 74 > > > > 2 files changed, 78 insertions(+), 10 deletions(-) > > > +++ b/qapi/block-core.json > > @@ -2116,10 +2116,10 @@ > > { 'enum': 'BlockdevDriver', > >'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop', > > 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom', > > -'host_device', 'http', 'https', 'luks', 'nbd', 'nfs', > > 'null-aio', > > -'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', > > 'raw', > > -'replication', 'ssh', 'vdi', 'vhdx', 'vmdk', 'vpc', > > -'vvfat' ] } > > +'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs', > > +'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed', > > +'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk', > > +'vpc', 'vvfat' ] } > > Are we missing a since 2.9 documentation designation here? > Yes, thanks for catching that. I've applied it to my branch already, but I will go ahead and fix this and rebase with the following squashed into the patch: diff --git a/qapi/block-core.json b/qapi/block-core.json index f00fc9d..5f82d35 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2110,6 +2110,7 @@ # @nfs: Since 2.8 # @replication: Since 2.8 # @ssh: Since 2.8 +# @iscsi: Since 2.9 # # Since: 2.0 ##
Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev
On 02/17/2017 03:37 PM, Jeff Cody wrote: > On Fri, Feb 17, 2017 at 05:39:01PM +0100, Peter Lieven wrote: >> if the passed qiov contains exactly one iov we can >> pass the buffer directly. >> >> Signed-off-by: Peter Lieven>> --- >> block/nfs.c | 23 --- >> 1 file changed, 16 insertions(+), 7 deletions(-) >> >> diff --git a/block/nfs.c b/block/nfs.c >> index ab5dcc2..bb4b75f 100644 >> --- a/block/nfs.c >> +++ b/block/nfs.c >> @@ -295,20 +295,27 @@ static int coroutine_fn >> nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, >> NFSClient *client = bs->opaque; >> NFSRPC task; >> char *buf = NULL; >> +bool my_buffer = false; > > g_free() is a nop if buf is NULL, so there is no need for the my_buffer > variable & check. Umm, yes there is: >> +if (iov->niov != 1) { >> +buf = g_try_malloc(bytes); >> +if (bytes && buf == NULL) { >> +return -ENOMEM; >> +} >> +qemu_iovec_to_buf(iov, 0, buf, bytes); >> +my_buffer = true; >> +} else { >> +buf = iov->iov[0].iov_base; If we took the else branch, then we definitely do not want to be calling g_free(buf). >> } >> >> -qemu_iovec_to_buf(iov, 0, buf, bytes); >> - >> if (nfs_pwrite_async(client->context, client->fh, >> offset, bytes, buf, >> nfs_co_generic_cb, ) != 0) { >> -g_free(buf); >> +if (my_buffer) { >> +g_free(buf); >> +} It looks correct as-is to me. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v2 6/7] iscsi: Add blockdev-add support
On 01/25/2017 11:42 AM, Jeff Cody wrote: > From: Kevin Wolf> > This adds blockdev-add support for iscsi devices. > > Reviewed-by: Daniel P. Berrange > Signed-off-by: Kevin Wolf > Signed-off-by: Jeff Cody > --- > block/iscsi.c| 14 ++ > qapi/block-core.json | 74 > > 2 files changed, 78 insertions(+), 10 deletions(-) > +++ b/qapi/block-core.json > @@ -2116,10 +2116,10 @@ > { 'enum': 'BlockdevDriver', >'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop', > 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom', > -'host_device', 'http', 'https', 'luks', 'nbd', 'nfs', 'null-aio', > -'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', > -'replication', 'ssh', 'vdi', 'vhdx', 'vmdk', 'vpc', > -'vvfat' ] } > +'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs', > +'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed', > +'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk', > +'vpc', 'vvfat' ] } Are we missing a since 2.9 documentation designation here? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev
On Fri, Feb 17, 2017 at 05:39:01PM +0100, Peter Lieven wrote: > if the passed qiov contains exactly one iov we can > pass the buffer directly. > > Signed-off-by: Peter Lieven> --- > block/nfs.c | 23 --- > 1 file changed, 16 insertions(+), 7 deletions(-) > > diff --git a/block/nfs.c b/block/nfs.c > index ab5dcc2..bb4b75f 100644 > --- a/block/nfs.c > +++ b/block/nfs.c > @@ -295,20 +295,27 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState > *bs, uint64_t offset, > NFSClient *client = bs->opaque; > NFSRPC task; > char *buf = NULL; > +bool my_buffer = false; g_free() is a nop if buf is NULL, so there is no need for the my_buffer variable & check. > > nfs_co_init_task(bs, ); > > -buf = g_try_malloc(bytes); > -if (bytes && buf == NULL) { > -return -ENOMEM; > +if (iov->niov != 1) { > +buf = g_try_malloc(bytes); > +if (bytes && buf == NULL) { > +return -ENOMEM; > +} > +qemu_iovec_to_buf(iov, 0, buf, bytes); > +my_buffer = true; > +} else { > +buf = iov->iov[0].iov_base; > } > > -qemu_iovec_to_buf(iov, 0, buf, bytes); > - > if (nfs_pwrite_async(client->context, client->fh, > offset, bytes, buf, > nfs_co_generic_cb, ) != 0) { > -g_free(buf); > +if (my_buffer) { > +g_free(buf); > +} > return -ENOMEM; > } > > @@ -317,7 +324,9 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState > *bs, uint64_t offset, > qemu_coroutine_yield(); > } > > -g_free(buf); > +if (my_buffer) { > +g_free(buf); > +} > > if (task.ret != bytes) { > return task.ret < 0 ? task.ret : -EIO; > -- > 1.9.1 >
Re: [Qemu-block] [PATCH 1/2] block/nfs: convert to preadv / pwritev
On Fri, Feb 17, 2017 at 05:39:00PM +0100, Peter Lieven wrote: > Signed-off-by: Peter Lieven> --- > block/nfs.c | 33 +++-- > 1 file changed, 15 insertions(+), 18 deletions(-) > > diff --git a/block/nfs.c b/block/nfs.c > index 689eaa7..ab5dcc2 100644 > --- a/block/nfs.c > +++ b/block/nfs.c > @@ -256,9 +256,9 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void > *data, > nfs_co_generic_bh_cb, task); > } > > -static int coroutine_fn nfs_co_readv(BlockDriverState *bs, > - int64_t sector_num, int nb_sectors, > - QEMUIOVector *iov) > +static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset, > + uint64_t bytes, QEMUIOVector *iov, > + int flags) > { > NFSClient *client = bs->opaque; > NFSRPC task; > @@ -267,9 +267,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs, > task.iov = iov; > > if (nfs_pread_async(client->context, client->fh, > -sector_num * BDRV_SECTOR_SIZE, > -nb_sectors * BDRV_SECTOR_SIZE, > -nfs_co_generic_cb, ) != 0) { > +offset, bytes, nfs_co_generic_cb, ) != 0) { > return -ENOMEM; > } > > @@ -290,9 +288,9 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs, > return 0; > } > > -static int coroutine_fn nfs_co_writev(BlockDriverState *bs, > -int64_t sector_num, int nb_sectors, > -QEMUIOVector *iov) > +static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, > + uint64_t bytes, QEMUIOVector *iov, > + int flags) > { > NFSClient *client = bs->opaque; > NFSRPC task; > @@ -300,17 +298,16 @@ static int coroutine_fn nfs_co_writev(BlockDriverState > *bs, > > nfs_co_init_task(bs, ); > > -buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE); > -if (nb_sectors && buf == NULL) { > +buf = g_try_malloc(bytes); > +if (bytes && buf == NULL) { > return -ENOMEM; > } > > -qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE); > +qemu_iovec_to_buf(iov, 0, buf, bytes); > > if (nfs_pwrite_async(client->context, client->fh, > - sector_num * BDRV_SECTOR_SIZE, > - nb_sectors * BDRV_SECTOR_SIZE, > - buf, nfs_co_generic_cb, ) != 0) { > + offset, bytes, buf, > + nfs_co_generic_cb, ) != 0) { > g_free(buf); > return -ENOMEM; > } > @@ -322,7 +319,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState > *bs, > > g_free(buf); > > -if (task.ret != nb_sectors * BDRV_SECTOR_SIZE) { > +if (task.ret != bytes) { > return task.ret < 0 ? task.ret : -EIO; > } > > @@ -856,8 +853,8 @@ static BlockDriver bdrv_nfs = { > .bdrv_create= nfs_file_create, > .bdrv_reopen_prepare= nfs_reopen_prepare, > > -.bdrv_co_readv = nfs_co_readv, > -.bdrv_co_writev = nfs_co_writev, > +.bdrv_co_preadv = nfs_co_preadv, > +.bdrv_co_pwritev= nfs_co_pwritev, > .bdrv_co_flush_to_disk = nfs_co_flush, > > .bdrv_detach_aio_context= nfs_detach_aio_context, > -- > 1.9.1 > Reviewed-by: Jeff Cody
Re: [Qemu-block] [Qemu-devel] [PATCH 21/24] util/cutils: Let qemu_strtosz*() optionally reject trailing crap
On 02/14/2017 04:26 AM, Markus Armbruster wrote: > Change the qemu_strtosz() & friends to return -EINVAL when @endptr is > null and the conversion doesn't consume the string completely. > Matches how qemu_strtol() & friends work. > > Only test_qemu_strtosz_simple() passes a null @endptr. No functional > change there, because its conversion consumes the string. > > Simplify callers that use @endptr only to fail when it doesn't point > to '\0' to pass a null @endptr instead. > > Cc: Dr. David Alan Gilbert> Cc: Eduardo Habkost (maintainer:X86) > Cc: Kevin Wolf (supporter:Block layer core) > Cc: Max Reitz (supporter:Block layer core) > Cc: qemu-block@nongnu.org (open list:Block layer core) > Signed-off-by: Markus Armbruster > --- > hmp.c | 6 ++ > hw/misc/ivshmem.c | 6 ++ > qapi/opts-visitor.c | 5 ++--- > qemu-img.c | 7 +-- > qemu-io-cmds.c | 7 +-- > target/i386/cpu.c | 5 ++--- > tests/test-cutils.c | 6 ++ > util/cutils.c | 14 +- > 8 files changed, 25 insertions(+), 31 deletions(-) Nice cleanup. Reviewed-by: Eric Blake > +++ b/qemu-img.c > @@ -370,14 +370,9 @@ static int add_old_style_options(const char *fmt, > QemuOpts *opts, > > static int64_t cvtnum(const char *s) > { > +++ b/qemu-io-cmds.c > @@ -137,14 +137,9 @@ static char **breakline(char *input, int *count) > > static int64_t cvtnum(const char *s) > { > -char *end; Why do we reimplement cvtnum() as copied static functions instead of exporting it? But that would be a separate cleanup (perhaps squashed into 20/24, where you use cvtnum in qemu-img). > @@ -217,7 +217,8 @@ static int64_t do_strtosz(const char *nptr, char **end, > errno = 0; > val = strtod(nptr, ); > if (isnan(val) || endptr == nptr || errno != 0) { Hmm - we explicitly reject "NaN", but not "infinity". But when strtod() accepts infinity, ... > -goto fail; > +retval = -EINVAL; > +goto out; > } > fraction = modf(val, ); then modf() returns 0 with integral left at infinity... > if (fraction != 0) { > @@ -232,17 +233,20 @@ static int64_t do_strtosz(const char *nptr, char **end, > assert(mul >= 0); > } > if (mul == 1 && mul_required) { > -goto fail; > +retval = -EINVAL; > +goto out; > } > if ((val * mul >= INT64_MAX) || val < 0) { ...and the multiply exceeds INT64_MAX, so we still end up rejecting it (with ERANGE instead of EINVAL). Weird way but seems to work, and is pre-existing, so not this patch's problem. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH 20/24] qemu-img: Wrap cvtnum() around qemu_strtosz()
On 02/14/2017 04:26 AM, Markus Armbruster wrote: > Cc: Kevin Wolf> Cc: Max Reitz > Cc: qemu-block@nongnu.org > Signed-off-by: Markus Armbruster > --- > qemu-img.c | 58 +++--- > 1 file changed, 31 insertions(+), 27 deletions(-) > @@ -3858,11 +3866,9 @@ static int img_dd_count(const char *arg, > struct DdIo *in, struct DdIo *out, > struct DdInfo *dd) > { > -char *end; > +dd->count = cvtnum(arg); Hmm. cvtnum() accepts "1.5G", GNU dd does not. POSIX requires dd to accept '1kx10k' to mean 10 mebibytes, and GNU dd accepts '10xM' as a synonym, but cvtnum() does not accept all those corner cases. POSIX requires dd to treat '1b' as '512', while cvdnum() treats it as '1'. I sometimes wonder if our 'qemu-img dd' subcommand should be reusing the numeric parsing that we use everywhere else, in spite of it meaning that we are different than the POSIX quirks on what numbers are required to be supported by dd. But that's not the concern of this patch. Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v2 0/7] iscsi: Add blockdev-add support
On Wed, Jan 25, 2017 at 12:42:01PM -0500, Jeff Cody wrote: > This adds blockdev-add support to the iscsi block driver. > > Picked this series up from Kevin. I've tested it on my local iscsi setup. > > There are only a few minor changes: > > * In patch 2, fixed the segfault pointed out by Daniel > * In patch 6, placed the ':' after the command header as now required > * New patch 7, to fix some out of date documentation in the qapi schema > > > Jeff Cody (1): > QAPI: Fix blockdev-add example documentation > > Kevin Wolf (6): > iscsi: Split URL into individual options > iscsi: Handle -iscsi user/password in bdrv_parse_filename() > iscsi: Add initiator-name option > iscsi: Add header-digest option > iscsi: Add timeout option > iscsi: Add blockdev-add support > > block/iscsi.c| 349 > +++ > qapi/block-core.json | 92 +++--- > 2 files changed, 288 insertions(+), 153 deletions(-) > > -- > 2.9.3 > Fixed up the formatting issues pointed out by Fam & patchew, and applied to my block branch: git://github.com/codyprime/qemu-kvm-jtc.git block -Jeff
[Qemu-block] [PULL 02/23] virtio: Report real progress in VQ aio poll handler
From: Fam ZhengIn virtio_queue_host_notifier_aio_poll, not all "!virtio_queue_empty()" cases are making true progress. Currently the offending one is virtio-scsi event queue, whose handler does nothing if no event is pending. As a result aio_poll() will spin on the "non-empty" VQ and take 100% host CPU. Fix this by reporting actual progress from virtio queue aio handlers. Reported-by: Ed Swierk Signed-off-by: Fam Zheng Tested-by: Ed Swierk Reviewed-by: Stefan Hajnoczi Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/virtio/virtio-blk.h | 2 +- include/hw/virtio/virtio-scsi.h | 6 +++--- include/hw/virtio/virtio.h | 4 ++-- hw/block/dataplane/virtio-blk.c | 4 ++-- hw/block/virtio-blk.c | 12 ++-- hw/scsi/virtio-scsi-dataplane.c | 14 +++--- hw/scsi/virtio-scsi.c | 14 +++--- hw/virtio/virtio.c | 15 +-- 8 files changed, 45 insertions(+), 26 deletions(-) diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index 9734b4c..d3c8a6f 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -80,6 +80,6 @@ typedef struct MultiReqBuffer { bool is_write; } MultiReqBuffer; -void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq); +bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq); #endif diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h index 7375196..f536f77 100644 --- a/include/hw/virtio/virtio-scsi.h +++ b/include/hw/virtio/virtio-scsi.h @@ -126,9 +126,9 @@ void virtio_scsi_common_realize(DeviceState *dev, Error **errp, VirtIOHandleOutput cmd); void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp); -void virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq); -void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq); -void virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq); +bool virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq); +bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq); +bool virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq); void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req); void virtio_scsi_free_req(VirtIOSCSIReq *req); void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev, diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 525da24..0863a25 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -154,6 +154,7 @@ void virtio_error(VirtIODevice *vdev, const char *fmt, ...) GCC_FMT_ATTR(2, 3); void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name); typedef void (*VirtIOHandleOutput)(VirtIODevice *, VirtQueue *); +typedef bool (*VirtIOHandleAIOOutput)(VirtIODevice *, VirtQueue *); VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, VirtIOHandleOutput handle_output); @@ -284,8 +285,7 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev); EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq); void virtio_queue_host_notifier_read(EventNotifier *n); void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx, -void (*fn)(VirtIODevice *, - VirtQueue *)); +VirtIOHandleAIOOutput handle_output); VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector); VirtQueue *virtio_vector_next_queue(VirtQueue *vq); diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index d1f9f63..5556f0e 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -147,7 +147,7 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s) g_free(s); } -static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev, +static bool virtio_blk_data_plane_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBlock *s = (VirtIOBlock *)vdev; @@ -155,7 +155,7 @@ static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev, assert(s->dataplane); assert(s->dataplane_started); -virtio_blk_handle_vq(s, vq); +return virtio_blk_handle_vq(s, vq); } /* Context: QEMU global mutex held */ diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 702eda8..baaa195 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -581,10 +581,11 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) return 0; } -void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) +bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) { VirtIOBlockReq *req;
Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name
* Fam Zheng (f...@redhat.com) wrote: > On Fri, 02/17 16:36, Vladimir Sementsov-Ogievskiy wrote: > > 17.02.2017 15:21, Fam Zheng wrote: > > > On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote: > > > > 16.02.2017 16:48, Fam Zheng wrote: > > > > > On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote: > > > > > > When testing migration, auto-generated by qemu node-names differs in > > > > > > source and destination qemu and migration fails. After this patch, > > > > > > auto-generated by iotest nodenames will be the same. > > > > > What should be done in libvirt to make sure the node-names are > > > > > matching > > > > > correctly at both sides? > > > > Hmm, just set node names appropriately? > > > But I think the problem is that node names are not configurable from > > > libvirt > > > today, and then the migration will fail. Should the device name take > > > precedence > > > in the code, to make it easier? > > > > libvirt can use same parameters as I in this patch.. > > If I'm not mistaken, libvirt can be patched to explicitly set the same node > names in the QEMU command line, but that is some extra work to do there. My > point is if device names are used during migration, when available, this patch > and the libvirt change is not necessary. Always best to check with libvirt guys to see what makes sense for them; ccing in jdenemar. Dave > Fam -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-block] [PATCH 1/3] curl: do not use aio_context_acquire/release
Now that all bottom halves and callbacks take care of taking the AioContext lock, we can migrate some users away from it and to a specific QemuMutex or CoMutex. Protect BDRVCURLState access with a QemuMutex. Signed-off-by: Paolo Bonzini--- block/curl.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/block/curl.c b/block/curl.c index 2939cc7..e83dcd8 100644 --- a/block/curl.c +++ b/block/curl.c @@ -135,6 +135,7 @@ typedef struct BDRVCURLState { char *cookie; bool accept_range; AioContext *aio_context; +QemuMutex mutex; char *username; char *password; char *proxyusername; @@ -333,6 +334,7 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len, return FIND_RET_NONE; } +/* Called with s->mutex held. */ static void curl_multi_check_completion(BDRVCURLState *s) { int msgs_in_queue; @@ -374,7 +376,9 @@ static void curl_multi_check_completion(BDRVCURLState *s) continue; } +qemu_mutex_unlock(>mutex); acb->common.cb(acb->common.opaque, -EPROTO); +qemu_mutex_lock(>mutex); qemu_aio_unref(acb); state->acb[i] = NULL; } @@ -386,6 +390,7 @@ static void curl_multi_check_completion(BDRVCURLState *s) } } +/* Called with s->mutex held. */ static void curl_multi_do_locked(CURLState *s) { CURLSocket *socket, *next_socket; @@ -409,19 +414,19 @@ static void curl_multi_do(void *arg) { CURLState *s = (CURLState *)arg; -aio_context_acquire(s->s->aio_context); +qemu_mutex_lock(>s->mutex); curl_multi_do_locked(s); -aio_context_release(s->s->aio_context); +qemu_mutex_unlock(>s->mutex); } static void curl_multi_read(void *arg) { CURLState *s = (CURLState *)arg; -aio_context_acquire(s->s->aio_context); +qemu_mutex_lock(>s->mutex); curl_multi_do_locked(s); curl_multi_check_completion(s->s); -aio_context_release(s->s->aio_context); +qemu_mutex_unlock(>s->mutex); } static void curl_multi_timeout_do(void *arg) @@ -434,11 +439,11 @@ static void curl_multi_timeout_do(void *arg) return; } -aio_context_acquire(s->aio_context); +qemu_mutex_lock(>mutex); curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, ); curl_multi_check_completion(s); -aio_context_release(s->aio_context); +qemu_mutex_unlock(>mutex); #else abort(); #endif @@ -771,6 +776,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags, curl_easy_cleanup(state->curl); state->curl = NULL; +qemu_mutex_init(>mutex); curl_attach_aio_context(bs, bdrv_get_aio_context(bs)); qemu_opts_del(opts); @@ -801,12 +807,11 @@ static void curl_readv_bh_cb(void *p) CURLAIOCB *acb = p; BlockDriverState *bs = acb->common.bs; BDRVCURLState *s = bs->opaque; -AioContext *ctx = bdrv_get_aio_context(bs); size_t start = acb->sector_num * BDRV_SECTOR_SIZE; size_t end; -aio_context_acquire(ctx); +qemu_mutex_lock(>mutex); // In case we have the requested data already (e.g. read-ahead), // we can just call the callback and be done. @@ -854,7 +859,7 @@ static void curl_readv_bh_cb(void *p) curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, ); out: -aio_context_release(ctx); +qemu_mutex_unlock(>mutex); if (ret != -EINPROGRESS) { acb->common.cb(acb->common.opaque, ret); qemu_aio_unref(acb); @@ -883,6 +888,7 @@ static void curl_close(BlockDriverState *bs) DPRINTF("CURL: Close\n"); curl_detach_aio_context(bs); +qemu_mutex_destroy(>mutex); g_free(s->cookie); g_free(s->url); -- 2.9.3
[Qemu-block] [PATCH 3/3] iscsi: do not use aio_context_acquire/release
Now that all bottom halves and callbacks take care of taking the AioContext lock, we can migrate some users away from it and to a specific QemuMutex or CoMutex. Protect libiscsi calls with a QemuMutex. Callbacks are invoked using bottom halves, so we don't even have to drop it around callback invocations. Signed-off-by: Paolo Bonzini--- block/iscsi.c | 83 +-- 1 file changed, 64 insertions(+), 19 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 2561be9..e483f6d 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -58,6 +58,7 @@ typedef struct IscsiLun { int events; QEMUTimer *nop_timer; QEMUTimer *event_timer; +QemuMutex mutex; struct scsi_inquiry_logical_block_provisioning lbp; struct scsi_inquiry_block_limits bl; unsigned char *zeroblock; @@ -252,6 +253,7 @@ static int iscsi_translate_sense(struct scsi_sense *sense) return ret; } +/* Called (via iscsi_service) with QemuMutex held. */ static void iscsi_co_generic_cb(struct iscsi_context *iscsi, int status, void *command_data, void *opaque) @@ -352,6 +354,7 @@ static const AIOCBInfo iscsi_aiocb_info = { static void iscsi_process_read(void *arg); static void iscsi_process_write(void *arg); +/* Called with QemuMutex held. */ static void iscsi_set_events(IscsiLun *iscsilun) { @@ -395,10 +398,10 @@ iscsi_process_read(void *arg) IscsiLun *iscsilun = arg; struct iscsi_context *iscsi = iscsilun->iscsi; -aio_context_acquire(iscsilun->aio_context); +qemu_mutex_lock(>mutex); iscsi_service(iscsi, POLLIN); iscsi_set_events(iscsilun); -aio_context_release(iscsilun->aio_context); +qemu_mutex_unlock(>mutex); } static void @@ -407,10 +410,10 @@ iscsi_process_write(void *arg) IscsiLun *iscsilun = arg; struct iscsi_context *iscsi = iscsilun->iscsi; -aio_context_acquire(iscsilun->aio_context); +qemu_mutex_lock(>mutex); iscsi_service(iscsi, POLLOUT); iscsi_set_events(iscsilun); -aio_context_release(iscsilun->aio_context); +qemu_mutex_unlock(>mutex); } static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun) @@ -589,6 +592,7 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors, uint64_t lba; uint32_t num_sectors; bool fua = flags & BDRV_REQ_FUA; +int r = 0; if (fua) { assert(iscsilun->dpofua); @@ -604,6 +608,7 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors, lba = sector_qemu2lun(sector_num, iscsilun); num_sectors = sector_qemu2lun(nb_sectors, iscsilun); iscsi_co_init_iscsitask(iscsilun, ); +qemu_mutex_lock(>mutex); retry: if (iscsilun->use_16_for_rw) { #if LIBISCSI_API_VERSION >= (20160603) @@ -640,7 +645,9 @@ retry: #endif while (!iTask.complete) { iscsi_set_events(iscsilun); +qemu_mutex_unlock(>mutex); qemu_coroutine_yield(); +qemu_mutex_lock(>mutex); } if (iTask.task != NULL) { @@ -655,12 +662,15 @@ retry: if (iTask.status != SCSI_STATUS_GOOD) { iscsi_allocmap_set_invalid(iscsilun, sector_num, nb_sectors); -return iTask.err_code; +r = iTask.err_code; +goto out_unlock; } iscsi_allocmap_set_allocated(iscsilun, sector_num, nb_sectors); -return 0; +out_unlock: +qemu_mutex_unlock(>mutex); +return r; } @@ -693,18 +703,21 @@ static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs, goto out; } +qemu_mutex_lock(>mutex); retry: if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun, sector_qemu2lun(sector_num, iscsilun), 8 + 16, iscsi_co_generic_cb, ) == NULL) { ret = -ENOMEM; -goto out; +goto out_unlock; } while (!iTask.complete) { iscsi_set_events(iscsilun); +qemu_mutex_unlock(>mutex); qemu_coroutine_yield(); +qemu_mutex_lock(>mutex); } if (iTask.do_retry) { @@ -721,20 +734,20 @@ retry: * because the device is busy or the cmd is not * supported) we pretend all blocks are allocated * for backwards compatibility */ -goto out; +goto out_unlock; } lbas = scsi_datain_unmarshall(iTask.task); if (lbas == NULL) { ret = -EIO; -goto out; +goto out_unlock; } lbasd = >descriptors[0]; if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) { ret = -EIO; -goto out; +goto out_unlock; } *pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun); @@ -756,6 +769,8 @@ retry: if (*pnum > nb_sectors) { *pnum = nb_sectors; } +out_unlock: +qemu_mutex_unlock(>mutex); out: if (iTask.task != NULL) {
[Qemu-block] [PATCH 0/3] do not use aio_context_acquire/release in AIO-based drivers
aio_context_acquire/release are only going away as soon as the block layer becomes thread-safe, but we can already move away to other finer-grained mutex whenever possible. These three drivers don't use coroutines, hence a QemuMutex is a fine primitive to use for protecting any per-BDS data in the libraries they use. The QemuMutex must protect any fd handlers or bottom halves, and also the BlockDriver callbacks which were implicitly being called under aio_context_acquire. Paolo Paolo Bonzini (3): curl: do not use aio_context_acquire/release nfs: do not use aio_context_acquire/release iscsi: do not use aio_context_acquire/release block/curl.c | 24 ++--- block/iscsi.c | 83 +-- block/nfs.c | 20 +++--- 3 files changed, 95 insertions(+), 32 deletions(-) -- 2.9.3
[Qemu-block] [PATCH 2/3] nfs: do not use aio_context_acquire/release
Now that all bottom halves and callbacks take care of taking the AioContext lock, we can migrate some users away from it and to a specific QemuMutex or CoMutex. Protect libnfs calls with a QemuMutex. Callbacks are invoked using bottom halves, so we don't even have to drop it around callback invocations. Signed-off-by: Paolo Bonzini--- block/nfs.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/block/nfs.c b/block/nfs.c index 08b43dd..4eddcee 100644 --- a/block/nfs.c +++ b/block/nfs.c @@ -54,6 +54,7 @@ typedef struct NFSClient { int events; bool has_zero_init; AioContext *aio_context; +QemuMutex mutex; blkcnt_t st_blocks; bool cache_used; NFSServer *server; @@ -191,6 +192,7 @@ static void nfs_parse_filename(const char *filename, QDict *options, static void nfs_process_read(void *arg); static void nfs_process_write(void *arg); +/* Called with QemuMutex held. */ static void nfs_set_events(NFSClient *client) { int ev = nfs_which_events(client->context); @@ -209,20 +211,20 @@ static void nfs_process_read(void *arg) { NFSClient *client = arg; -aio_context_acquire(client->aio_context); +qemu_mutex_lock(>mutex); nfs_service(client->context, POLLIN); nfs_set_events(client); -aio_context_release(client->aio_context); +qemu_mutex_unlock(>mutex); } static void nfs_process_write(void *arg) { NFSClient *client = arg; -aio_context_acquire(client->aio_context); +qemu_mutex_lock(>mutex); nfs_service(client->context, POLLOUT); nfs_set_events(client); -aio_context_release(client->aio_context); +qemu_mutex_unlock(>mutex); } static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task) @@ -242,6 +244,7 @@ static void nfs_co_generic_bh_cb(void *opaque) aio_co_wake(task->co); } +/* Called (via nfs_service) with QemuMutex held. */ static void nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data, void *private_data) @@ -273,6 +276,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs, nfs_co_init_task(bs, ); task.iov = iov; +qemu_mutex_lock(>mutex); if (nfs_pread_async(client->context, client->fh, sector_num * BDRV_SECTOR_SIZE, nb_sectors * BDRV_SECTOR_SIZE, @@ -281,6 +285,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs, } nfs_set_events(client); +qemu_mutex_unlock(>mutex); while (!task.complete) { qemu_coroutine_yield(); } @@ -314,6 +319,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs, qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE); +qemu_mutex_lock(>mutex); if (nfs_pwrite_async(client->context, client->fh, sector_num * BDRV_SECTOR_SIZE, nb_sectors * BDRV_SECTOR_SIZE, @@ -323,6 +329,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs, } nfs_set_events(client); +qemu_mutex_unlock(>mutex); while (!task.complete) { qemu_coroutine_yield(); } @@ -343,12 +350,14 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs) nfs_co_init_task(bs, ); +qemu_mutex_lock(>mutex); if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb, ) != 0) { return -ENOMEM; } nfs_set_events(client); +qemu_mutex_unlock(>mutex); while (!task.complete) { qemu_coroutine_yield(); } @@ -434,6 +443,7 @@ static void nfs_file_close(BlockDriverState *bs) { NFSClient *client = bs->opaque; nfs_client_close(client); +qemu_mutex_destroy(>mutex); } static NFSServer *nfs_config(QDict *options, Error **errp) @@ -641,6 +651,7 @@ static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags, if (ret < 0) { return ret; } +qemu_mutex_init(>mutex); bs->total_sectors = ret; ret = 0; return ret; @@ -696,6 +707,7 @@ static int nfs_has_zero_init(BlockDriverState *bs) return client->has_zero_init; } +/* Called (via nfs_service) with QemuMutex held. */ static void nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data, void *private_data) -- 2.9.3
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
On 02/17/2017 04:34 PM, Kevin Wolf wrote: > Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: >> On 02/17/2017 03:48 PM, Kevin Wolf wrote: >>> Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: 17.02.2017 15:09, Kevin Wolf wrote: > Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 16.02.2017 14:49, Kevin Wolf wrote: >>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > Auto loading bitmaps are bitmaps stored in the disk image, which > should > be loaded when the image is opened and become BdrvDirtyBitmaps for the > corresponding drive. > > Signed-off-by: Vladimir Sementsov-Ogievskiy> Reviewed-by: John Snow > Reviewed-by: Max Reitz Why do we need a new BlockDriver callback and special code for it in bdrv_open_common()? The callback is only ever called immediately after .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally in their .bdrv_open implementation? Even more so because qcow2 is the only driver that supports this callback. >>> Actually, don't we have to call this in qcow2_invalidate_cache()? >>> Currently, I think, after a migration, the autoload bitmaps aren't >>> loaded. >>> >>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to >>> qcow2_open(), this would be fixed. >>> >>> Kevin >> Bitmap should not be reloaded on any intermediate qcow2-open's, >> reopens, etc. It should be loaded once, on bdrv_open, to not create >> extra collisions (between in-memory bitmap and it's stored version). >> That was the idea. >> >> For bitmaps migration there are separate series, we shouldn't load >> bitmap from file on migration, as it's version in the file is >> outdated. > That's not what your series is doing, though. It loads the bitmaps when Actually, they will not be loaded as they will have IN_USE flag. > migration starts and doesn't reload then when migration completes, even > though they are stale. Migration with shared storage would just work > without an extra series if you did these things in the correct places. > > As a reminder, this is how migration with shared storage works (or > should work with your series): > > 1. Start destination qemu instance. This calls bdrv_open() with >BDRV_O_INACTIVE. We can read in some metadata, though we don't need >much more than the image size at this point. Writing to the image is >still impossible. > > 2. Start migration on the source, while the VM is still writing to the >image, rendering the cached metadata from step 1 stale. > > 3. Migration completes: > > a. Stop the VM > > b. Inactivate all images in the source qemu. This is where all >metadata needs to be written back to the image file, including >bitmaps. No writes to the image are possible after this point >because BDRV_O_INACTIVE is set. > > c. Invalidate the caches in the destination qemu, i.e. reload >everything from the file that could have changed since step 1, >including bitmaps. BDRV_O_INACTIVE is cleared, making the image >ready for writes. > > d. Resume the VM on the destination > > 4. Exit the source qemu process, which involves bdrv_close(). Note that >at this point, no writing to the image file is possible any more, >it's the destination qemu process that own the image file now. > > Your series loads and stores bitmaps in steps 1 and 4. This means that Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and it is checked), nothing is stored. > they are stale on the destination when migration completes, and that > bdrv_close() wants to write to an image file that it doesn't own any > more, which will cause an assertion failure. If you instead move things > to steps 3b and 3c, it will just work. Hmm, I understand the idea.. But this will interfere with postcopy bitmap migration. So if we really need this, there should be some additional control flags or capabilities.. The problem of your approach is that bitmap actually migrated in the short state when source and destination are stopped, it may take time, as bitmaps may be large. >>> You can always add optimisations, but this is the basic lifecycle >>> process of block devices in qemu, so it would be good to adhere to it. >>> So far there are no other pieces of information that are ignored in >>> bdrv_invalidate()/bdrv_inactivate() and instead only handled in >>> bdrv_open()/bdrv_close(). It's a matter of consistency,
[Qemu-block] [PATCH 0/2] block/nfs optimizations
Peter Lieven (2): block/nfs: convert to preadv / pwritev block/nfs: try to avoid the bounce buffer in pwritev block/nfs.c | 50 -- 1 file changed, 28 insertions(+), 22 deletions(-) -- 1.9.1
[Qemu-block] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev
if the passed qiov contains exactly one iov we can pass the buffer directly. Signed-off-by: Peter Lieven--- block/nfs.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/block/nfs.c b/block/nfs.c index ab5dcc2..bb4b75f 100644 --- a/block/nfs.c +++ b/block/nfs.c @@ -295,20 +295,27 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, NFSClient *client = bs->opaque; NFSRPC task; char *buf = NULL; +bool my_buffer = false; nfs_co_init_task(bs, ); -buf = g_try_malloc(bytes); -if (bytes && buf == NULL) { -return -ENOMEM; +if (iov->niov != 1) { +buf = g_try_malloc(bytes); +if (bytes && buf == NULL) { +return -ENOMEM; +} +qemu_iovec_to_buf(iov, 0, buf, bytes); +my_buffer = true; +} else { +buf = iov->iov[0].iov_base; } -qemu_iovec_to_buf(iov, 0, buf, bytes); - if (nfs_pwrite_async(client->context, client->fh, offset, bytes, buf, nfs_co_generic_cb, ) != 0) { -g_free(buf); +if (my_buffer) { +g_free(buf); +} return -ENOMEM; } @@ -317,7 +324,9 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, qemu_coroutine_yield(); } -g_free(buf); +if (my_buffer) { +g_free(buf); +} if (task.ret != bytes) { return task.ret < 0 ? task.ret : -EIO; -- 1.9.1
[Qemu-block] [PATCH 1/2] block/nfs: convert to preadv / pwritev
Signed-off-by: Peter Lieven--- block/nfs.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/block/nfs.c b/block/nfs.c index 689eaa7..ab5dcc2 100644 --- a/block/nfs.c +++ b/block/nfs.c @@ -256,9 +256,9 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data, nfs_co_generic_bh_cb, task); } -static int coroutine_fn nfs_co_readv(BlockDriverState *bs, - int64_t sector_num, int nb_sectors, - QEMUIOVector *iov) +static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *iov, + int flags) { NFSClient *client = bs->opaque; NFSRPC task; @@ -267,9 +267,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs, task.iov = iov; if (nfs_pread_async(client->context, client->fh, -sector_num * BDRV_SECTOR_SIZE, -nb_sectors * BDRV_SECTOR_SIZE, -nfs_co_generic_cb, ) != 0) { +offset, bytes, nfs_co_generic_cb, ) != 0) { return -ENOMEM; } @@ -290,9 +288,9 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs, return 0; } -static int coroutine_fn nfs_co_writev(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, -QEMUIOVector *iov) +static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *iov, + int flags) { NFSClient *client = bs->opaque; NFSRPC task; @@ -300,17 +298,16 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs, nfs_co_init_task(bs, ); -buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE); -if (nb_sectors && buf == NULL) { +buf = g_try_malloc(bytes); +if (bytes && buf == NULL) { return -ENOMEM; } -qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE); +qemu_iovec_to_buf(iov, 0, buf, bytes); if (nfs_pwrite_async(client->context, client->fh, - sector_num * BDRV_SECTOR_SIZE, - nb_sectors * BDRV_SECTOR_SIZE, - buf, nfs_co_generic_cb, ) != 0) { + offset, bytes, buf, + nfs_co_generic_cb, ) != 0) { g_free(buf); return -ENOMEM; } @@ -322,7 +319,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs, g_free(buf); -if (task.ret != nb_sectors * BDRV_SECTOR_SIZE) { +if (task.ret != bytes) { return task.ret < 0 ? task.ret : -EIO; } @@ -856,8 +853,8 @@ static BlockDriver bdrv_nfs = { .bdrv_create= nfs_file_create, .bdrv_reopen_prepare= nfs_reopen_prepare, -.bdrv_co_readv = nfs_co_readv, -.bdrv_co_writev = nfs_co_writev, +.bdrv_co_preadv = nfs_co_preadv, +.bdrv_co_pwritev= nfs_co_pwritev, .bdrv_co_flush_to_disk = nfs_co_flush, .bdrv_detach_aio_context= nfs_detach_aio_context, -- 1.9.1
[Qemu-block] [RFC PATCH V4] qemu-img: make convert async
this is something I have been thinking about for almost 2 years now. we heavily have the following two use cases when using qemu-img convert. a) reading from NFS and writing to iSCSI for deploying templates b) reading from iSCSI and writing to NFS for backups In both processes we use libiscsi and libnfs so we have no kernel pagecache. As qemu-img convert is implemented with sync operations that means we read one buffer and then write it. No parallelism and each sync request takes as long as it takes until it is completed. This is version 4 of the approach using coroutine worker "threads". So far I have the following runtimes when reading an uncompressed QCOW2 from NFS and writing it to iSCSI (raw): qemu-img (master) nfs -> iscsi 22.8 secs nfs -> ram 11.7 secs ram -> iscsi 12.3 secs qemu-img-async (8 coroutines, in-order write disabled) nfs -> iscsi 11.0 secs nfs -> ram 10.4 secs ram -> iscsi 9.0 secs The following are the runtimes found with different settings between V3 and V4. This is always the best runtime out of 10 runs when converting from nfs to iscsi. Please note that in V4 in-order write scenarios show a very high jitter. I think this is because the get_block_status on the NFS share is delayed by concurrent read requests. in-orderout-of-order V3 - 16 coroutines12.4 seconds11.1 seconds - 8 coroutines12.2 seconds11.3 seconds - 4 coroutines12.5 seconds11.1 seconds - 2 coroutines14.8 seconds14.9 seconds V4 - 32 coroutines15.9 seconds11.5 seconds - 16 coroutines12.5 seconds11.0 seconds - 8 coroutines12.9 seconds11.0 seconds - 4 coroutines14.1 seconds11.5 seconds - 2 coroutines16.9 seconds13.2 seconds Comments appreciated. Thank you, Peter Signed-off-by: Peter Lieven--- v3->v4: - avoid to prepare a request queue upfront [Kevin] - do not ignore the BLK_BACKING_FILE status [Kevin] - redesign the interface to the read and write routines [Kevin] v2->v3: - updated stats in the commit msg from a host with a better network card - only wake up the coroutine that is acutally waiting for a write to complete. this was not only overhead, but also breaking at least linux AIO. - fix coding style complaints - rename some variables and structs v1->v2: - using coroutine as worker "threads". [Max] - keeping the request queue as otherwise it happens that we wait on BLK_ZERO chunks while keeping the write order. it also avoids redundant calls to get_block_status and helps to skip some conditions for fully allocated imaged (!s->min_sparse) --- qemu-img.c | 260 - 1 file changed, 187 insertions(+), 73 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index cff22e3..6bac980 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1448,6 +1448,8 @@ enum ImgConvertBlockStatus { BLK_BACKING_FILE, }; +#define MAX_COROUTINES 16 + typedef struct ImgConvertState { BlockBackend **src; int64_t *src_sectors; @@ -1455,15 +1457,25 @@ typedef struct ImgConvertState { int64_t src_cur_offset; int64_t total_sectors; int64_t allocated_sectors; +int64_t allocated_done; +int64_t sector_num; +int64_t wr_offs; enum ImgConvertBlockStatus status; int64_t sector_next_status; BlockBackend *target; bool has_zero_init; bool compressed; bool target_has_backing; +bool wr_in_order; int min_sparse; size_t cluster_sectors; size_t buf_sectors; +int num_coroutines; +int running_coroutines; +Coroutine *co[MAX_COROUTINES]; +int64_t wait_sector_num[MAX_COROUTINES]; +CoMutex lock; +int ret; } ImgConvertState; static void convert_select_part(ImgConvertState *s, int64_t sector_num) @@ -1544,11 +1556,12 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num) return n; } -static int convert_read(ImgConvertState *s, int64_t sector_num, int nb_sectors, -uint8_t *buf) +static int convert_co_read(ImgConvertState *s, int64_t sector_num, + int nb_sectors, uint8_t *buf) { -int n; -int ret; +int n, ret; +QEMUIOVector qiov; +struct iovec iov; assert(nb_sectors <= s->buf_sectors); while (nb_sectors > 0) { @@ -1563,9 +1576,13 @@ static int convert_read(ImgConvertState *s, int64_t sector_num, int nb_sectors, bs_sectors = s->src_sectors[s->src_cur]; n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset)); -ret = blk_pread(blk, -(sector_num - s->src_cur_offset) << BDRV_SECTOR_BITS, -buf, n << BDRV_SECTOR_BITS); +iov.iov_base = buf; +iov.iov_len = n << BDRV_SECTOR_BITS; +qemu_iovec_init_external(, , 1); + +ret =
Re: [Qemu-block] [PATCH v15 25/25] qcow2-bitmap: improve check_constraints_on_bitmap
On 02/17/2017 04:18 AM, Vladimir Sementsov-Ogievskiy wrote: > 16.02.2017 17:21, Kevin Wolf wrote: >> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> Add detailed error messages. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy>> Why not merge this patch into the one that originally introduced the >> function? > > Just to not create extra work for reviewers It's extra work for reviewers if you don't rebase obvious fixes where they belong - a new reviewer may flag the issue in the earlier patch only to find out later in the series that you've already fixed it. Avoiding needless code churn is part of what rebasing is all about - you want each step of the series to be self-contained and as correct as possible, by adding in the fixes at the point where they make sense, rather than at the end of the series. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
17.02.2017 17:24, Kevin Wolf wrote: Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: On 02/17/2017 04:34 PM, Kevin Wolf wrote: Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: But for sure this is bad from the downtime point of view. On migrate you will have to write to the image and re-read it again on the target. This would be very slow. This will not help for the migration with non-shared disk too. That is why we have specifically worked in a migration, which for a good does not influence downtime at all now. With a write we are issuing several write requests + sync. Our measurements shows that bdrv_drain could take around a second on an averagely loaded conventional system, which seems unacceptable addition to me. I'm not arguing against optimising migration, I fully agree with you. I just think that we should start with a correct if slow base version and then add optimisation to that, instead of starting with a broken base version and adding to that. Look, whether you do the expensive I/O on open/close and make that a slow operation or whether you do it on invalidate_cache/inactivate doesn't really make a difference in term of slowness because in general both operations are called exactly once. But it does make a difference in terms of correctness. Once you do the optimisation, of course, you'll skip writing those bitmaps that you transfer using a different channel, no matter whether you skip it in bdrv_close() or in bdrv_inactivate(). Kevin I do not understand this point as in order to optimize this we will have to create specific code path or option from the migration code and keep this as an ugly kludge forever. The point that I don't understand is why it makes any difference for the follow-up migration series whether the writeout is in bdrv_close() or bdrv_inactivate(). I don't really see the difference between the two from a migration POV; both need to be skipped if we transfer the bitmap using a different channel. Maybe I would see the reason if I could find the time to look at the migration patches first, but unfortunately I don't have this time at the moment. My point is just that generally we want to have a correctly working qemu after every single patch, and even more importantly after every series. As the migration series is separate from this, I don't think it's a good excuse for doing worse than we could easily do here. Kevin With open/close all is already ok - bitmaps will not be saved because of BDRV_O_INACTIVE and will not be loaded because of IN_USE. -- Best regards, Vladimir
Re: [Qemu-block] [PATCH 1/3] qemu-img: Add tests for raw image preallocation
Am 17.02.2017 um 15:20 hat Nir Soffer geschrieben: > On Fri, Feb 17, 2017 at 11:14 AM, Kevin Wolfwrote: > > Am 17.02.2017 um 01:51 hat Nir Soffer geschrieben: > >> Add tests for creating raw image with and without the preallocation > >> option. > >> > >> Signed-off-by: Nir Soffer > > > > Looks good, but 175 is already (multiply) taken. Not making this a > > blocker, but I just want to remind everyone to check the mailing list > > for pending patches which add new tests before using a new number in > > order to avoid unnecessary rebases for everyone. In general, it's as > > easy as searching for the string "175.out" in the mailbox. > > > > The next free one seems to be 177 currently. > > Thanks, will change to 177 in the next version. If there needs to be a next version for other reasons. Otherwise it's not important enough to respin, it just means that someone else will have to rebase. > For next patches, what do you mean by "pending"? patches sent > to the block mailing list? Yes, that's what I'm looking for when I add a new test case myself. It's just an easy way to avoid stepping on each others toes. Kevin
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben: > On 02/17/2017 04:34 PM, Kevin Wolf wrote: > > Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > >> But for sure this is bad from the downtime point of view. > >> On migrate you will have to write to the image and re-read > >> it again on the target. This would be very slow. This will > >> not help for the migration with non-shared disk too. > >> > >> That is why we have specifically worked in a migration, > >> which for a good does not influence downtime at all now. > >> > >> With a write we are issuing several write requests + sync. > >> Our measurements shows that bdrv_drain could take around > >> a second on an averagely loaded conventional system, which > >> seems unacceptable addition to me. > > I'm not arguing against optimising migration, I fully agree with you. I > > just think that we should start with a correct if slow base version and > > then add optimisation to that, instead of starting with a broken base > > version and adding to that. > > > > Look, whether you do the expensive I/O on open/close and make that a > > slow operation or whether you do it on invalidate_cache/inactivate > > doesn't really make a difference in term of slowness because in general > > both operations are called exactly once. But it does make a difference > > in terms of correctness. > > > > Once you do the optimisation, of course, you'll skip writing those > > bitmaps that you transfer using a different channel, no matter whether > > you skip it in bdrv_close() or in bdrv_inactivate(). > > > > Kevin > I do not understand this point as in order to optimize this > we will have to create specific code path or option from > the migration code and keep this as an ugly kludge forever. The point that I don't understand is why it makes any difference for the follow-up migration series whether the writeout is in bdrv_close() or bdrv_inactivate(). I don't really see the difference between the two from a migration POV; both need to be skipped if we transfer the bitmap using a different channel. Maybe I would see the reason if I could find the time to look at the migration patches first, but unfortunately I don't have this time at the moment. My point is just that generally we want to have a correctly working qemu after every single patch, and even more importantly after every series. As the migration series is separate from this, I don't think it's a good excuse for doing worse than we could easily do here. Kevin
Re: [Qemu-block] [PATCH 1/3] qemu-img: Add tests for raw image preallocation
On Fri, Feb 17, 2017 at 11:14 AM, Kevin Wolfwrote: > Am 17.02.2017 um 01:51 hat Nir Soffer geschrieben: >> Add tests for creating raw image with and without the preallocation >> option. >> >> Signed-off-by: Nir Soffer > > Looks good, but 175 is already (multiply) taken. Not making this a > blocker, but I just want to remind everyone to check the mailing list > for pending patches which add new tests before using a new number in > order to avoid unnecessary rebases for everyone. In general, it's as > easy as searching for the string "175.out" in the mailbox. > > The next free one seems to be 177 currently. Thanks, will change to 177 in the next version. For next patches, what do you mean by "pending"? patches sent to the block mailing list? Nir
Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/7] iscsi: Handle -iscsi user/password in bdrv_parse_filename()
On Wed, 01/25 12:42, Jeff Cody wrote: > From: Kevin Wolf> > This splits the logic in the old parse_chap() function into a part that > parses the -iscsi options into the new driver-specific options, and > another part that actually applies those options (called apply_chap() > now). > > Note that this means that username and password specified with -iscsi > only take effect when a URL is provided. This is intentional, -iscsi is > a legacy interface only supported for compatibility, new users should > use the proper driver-specific options. > > Signed-off-by: Kevin Wolf > Signed-off-by: Jeff Cody Reviewed-by: Fam Zheng
Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/7] iscsi: Handle -iscsi user/password in bdrv_parse_filename()
On Fri, 02/17 14:26, Kevin Wolf wrote: > It is the one that it put into the QDict by iscsi_parse_iscsi_option(), > which is supposed to be the value from -iscsi. OK! This is what I was missing. :) Fam
Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name
On Fri, 02/17 16:36, Vladimir Sementsov-Ogievskiy wrote: > 17.02.2017 15:21, Fam Zheng wrote: > > On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote: > > > 16.02.2017 16:48, Fam Zheng wrote: > > > > On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote: > > > > > When testing migration, auto-generated by qemu node-names differs in > > > > > source and destination qemu and migration fails. After this patch, > > > > > auto-generated by iotest nodenames will be the same. > > > > What should be done in libvirt to make sure the node-names are matching > > > > correctly at both sides? > > > Hmm, just set node names appropriately? > > But I think the problem is that node names are not configurable from libvirt > > today, and then the migration will fail. Should the device name take > > precedence > > in the code, to make it easier? > > libvirt can use same parameters as I in this patch.. If I'm not mistaken, libvirt can be patched to explicitly set the same node names in the QEMU command line, but that is some extra work to do there. My point is if device names are used during migration, when available, this patch and the libvirt change is not necessary. Fam
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
On 02/17/2017 03:48 PM, Kevin Wolf wrote: > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 17.02.2017 15:09, Kevin Wolf wrote: >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: 16.02.2017 14:49, Kevin Wolf wrote: > Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: >> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> Auto loading bitmaps are bitmaps stored in the disk image, which should >>> be loaded when the image is opened and become BdrvDirtyBitmaps for the >>> corresponding drive. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy>>> Reviewed-by: John Snow >>> Reviewed-by: Max Reitz >> Why do we need a new BlockDriver callback and special code for it in >> bdrv_open_common()? The callback is only ever called immediately after >> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally >> in their .bdrv_open implementation? Even more so because qcow2 is the >> only driver that supports this callback. > Actually, don't we have to call this in qcow2_invalidate_cache()? > Currently, I think, after a migration, the autoload bitmaps aren't > loaded. > > By moving the qcow2_load_autoloading_dirty_bitmaps() call to > qcow2_open(), this would be fixed. > > Kevin Bitmap should not be reloaded on any intermediate qcow2-open's, reopens, etc. It should be loaded once, on bdrv_open, to not create extra collisions (between in-memory bitmap and it's stored version). That was the idea. For bitmaps migration there are separate series, we shouldn't load bitmap from file on migration, as it's version in the file is outdated. >>> That's not what your series is doing, though. It loads the bitmaps when >> Actually, they will not be loaded as they will have IN_USE flag. >> >>> migration starts and doesn't reload then when migration completes, even >>> though they are stale. Migration with shared storage would just work >>> without an extra series if you did these things in the correct places. >>> >>> As a reminder, this is how migration with shared storage works (or >>> should work with your series): >>> >>> 1. Start destination qemu instance. This calls bdrv_open() with >>>BDRV_O_INACTIVE. We can read in some metadata, though we don't need >>>much more than the image size at this point. Writing to the image is >>>still impossible. >>> >>> 2. Start migration on the source, while the VM is still writing to the >>>image, rendering the cached metadata from step 1 stale. >>> >>> 3. Migration completes: >>> >>> a. Stop the VM >>> >>> b. Inactivate all images in the source qemu. This is where all >>>metadata needs to be written back to the image file, including >>>bitmaps. No writes to the image are possible after this point >>>because BDRV_O_INACTIVE is set. >>> >>> c. Invalidate the caches in the destination qemu, i.e. reload >>>everything from the file that could have changed since step 1, >>>including bitmaps. BDRV_O_INACTIVE is cleared, making the image >>>ready for writes. >>> >>> d. Resume the VM on the destination >>> >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that >>>at this point, no writing to the image file is possible any more, >>>it's the destination qemu process that own the image file now. >>> >>> Your series loads and stores bitmaps in steps 1 and 4. This means that >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and >> it is checked), nothing is stored. >> >>> they are stale on the destination when migration completes, and that >>> bdrv_close() wants to write to an image file that it doesn't own any >>> more, which will cause an assertion failure. If you instead move things >>> to steps 3b and 3c, it will just work. >> Hmm, I understand the idea.. But this will interfere with postcopy >> bitmap migration. So if we really need this, there should be some >> additional control flags or capabilities.. The problem of your >> approach is that bitmap actually migrated in the short state when >> source and destination are stopped, it may take time, as bitmaps may >> be large. > You can always add optimisations, but this is the basic lifecycle > process of block devices in qemu, so it would be good to adhere to it. > So far there are no other pieces of information that are ignored in > bdrv_invalidate()/bdrv_inactivate() and instead only handled in > bdrv_open()/bdrv_close(). It's a matter of consistency, too. > > And not having to add special cases for specific features in the generic > bdrv_open()/close() paths is a big plus for me anyway. > > Kevin But for sure this is bad from the downtime point of view. On migrate you will have to write to the image and re-read it again on the target. This would be
Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name
17.02.2017 15:21, Fam Zheng wrote: On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote: 16.02.2017 16:48, Fam Zheng wrote: On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote: When testing migration, auto-generated by qemu node-names differs in source and destination qemu and migration fails. After this patch, auto-generated by iotest nodenames will be the same. What should be done in libvirt to make sure the node-names are matching correctly at both sides? Hmm, just set node names appropriately? But I think the problem is that node names are not configurable from libvirt today, and then the migration will fail. Should the device name take precedence in the code, to make it easier? Why not configurable? libvirt can use same parameters as I in this patch.. Or what do you mean? Fam -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > On 02/17/2017 03:48 PM, Kevin Wolf wrote: > > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: > >> 17.02.2017 15:09, Kevin Wolf wrote: > >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > 16.02.2017 14:49, Kevin Wolf wrote: > > Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > >> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>> Auto loading bitmaps are bitmaps stored in the disk image, which > >>> should > >>> be loaded when the image is opened and become BdrvDirtyBitmaps for the > >>> corresponding drive. > >>> > >>> Signed-off-by: Vladimir Sementsov-Ogievskiy> >>> Reviewed-by: John Snow > >>> Reviewed-by: Max Reitz > >> Why do we need a new BlockDriver callback and special code for it in > >> bdrv_open_common()? The callback is only ever called immediately after > >> .bdrv_open/.bdrv_file_open, so can't the drivers just do this > >> internally > >> in their .bdrv_open implementation? Even more so because qcow2 is the > >> only driver that supports this callback. > > Actually, don't we have to call this in qcow2_invalidate_cache()? > > Currently, I think, after a migration, the autoload bitmaps aren't > > loaded. > > > > By moving the qcow2_load_autoloading_dirty_bitmaps() call to > > qcow2_open(), this would be fixed. > > > > Kevin > Bitmap should not be reloaded on any intermediate qcow2-open's, > reopens, etc. It should be loaded once, on bdrv_open, to not create > extra collisions (between in-memory bitmap and it's stored version). > That was the idea. > > For bitmaps migration there are separate series, we shouldn't load > bitmap from file on migration, as it's version in the file is > outdated. > >>> That's not what your series is doing, though. It loads the bitmaps when > >> Actually, they will not be loaded as they will have IN_USE flag. > >> > >>> migration starts and doesn't reload then when migration completes, even > >>> though they are stale. Migration with shared storage would just work > >>> without an extra series if you did these things in the correct places. > >>> > >>> As a reminder, this is how migration with shared storage works (or > >>> should work with your series): > >>> > >>> 1. Start destination qemu instance. This calls bdrv_open() with > >>>BDRV_O_INACTIVE. We can read in some metadata, though we don't need > >>>much more than the image size at this point. Writing to the image is > >>>still impossible. > >>> > >>> 2. Start migration on the source, while the VM is still writing to the > >>>image, rendering the cached metadata from step 1 stale. > >>> > >>> 3. Migration completes: > >>> > >>> a. Stop the VM > >>> > >>> b. Inactivate all images in the source qemu. This is where all > >>>metadata needs to be written back to the image file, including > >>>bitmaps. No writes to the image are possible after this point > >>>because BDRV_O_INACTIVE is set. > >>> > >>> c. Invalidate the caches in the destination qemu, i.e. reload > >>>everything from the file that could have changed since step 1, > >>>including bitmaps. BDRV_O_INACTIVE is cleared, making the image > >>>ready for writes. > >>> > >>> d. Resume the VM on the destination > >>> > >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that > >>>at this point, no writing to the image file is possible any more, > >>>it's the destination qemu process that own the image file now. > >>> > >>> Your series loads and stores bitmaps in steps 1 and 4. This means that > >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and > >> it is checked), nothing is stored. > >> > >>> they are stale on the destination when migration completes, and that > >>> bdrv_close() wants to write to an image file that it doesn't own any > >>> more, which will cause an assertion failure. If you instead move things > >>> to steps 3b and 3c, it will just work. > >> Hmm, I understand the idea.. But this will interfere with postcopy > >> bitmap migration. So if we really need this, there should be some > >> additional control flags or capabilities.. The problem of your > >> approach is that bitmap actually migrated in the short state when > >> source and destination are stopped, it may take time, as bitmaps may > >> be large. > > You can always add optimisations, but this is the basic lifecycle > > process of block devices in qemu, so it would be good to adhere to it. > > So far there are no other pieces of information that are ignored in > > bdrv_invalidate()/bdrv_inactivate() and instead only handled in > > bdrv_open()/bdrv_close(). It's a matter of consistency, too. > > > > And not having to add special
Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/7] iscsi: Handle -iscsi user/password in bdrv_parse_filename()
Am 07.02.2017 um 11:13 hat Fam Zheng geschrieben: > On Wed, 01/25 12:42, Jeff Cody wrote: > > From: Kevin Wolf> > > > This splits the logic in the old parse_chap() function into a part that > > parses the -iscsi options into the new driver-specific options, and > > another part that actually applies those options (called apply_chap() > > now). > > > > Note that this means that username and password specified with -iscsi > > only take effect when a URL is provided. This is intentional, -iscsi is > > a legacy interface only supported for compatibility, new users should > > use the proper driver-specific options. > > > > Signed-off-by: Kevin Wolf > > Signed-off-by: Jeff Cody > > --- > > block/iscsi.c | 78 > > +-- > > 1 file changed, 44 insertions(+), 34 deletions(-) > > > > diff --git a/block/iscsi.c b/block/iscsi.c > > index 97cff30..fc91d0f 100644 > > --- a/block/iscsi.c > > +++ b/block/iscsi.c > > @@ -1236,29 +1236,14 @@ retry: > > return 0; > > } > > > > -static void parse_chap(struct iscsi_context *iscsi, const char *target, > > +static void apply_chap(struct iscsi_context *iscsi, QemuOpts *opts, > > Error **errp) > > { > > -QemuOptsList *list; > > -QemuOpts *opts; > > const char *user = NULL; > > const char *password = NULL; > > const char *secretid; > > char *secret = NULL; > > > > -list = qemu_find_opts("iscsi"); > > -if (!list) { > > -return; > > -} > > - > > -opts = qemu_opts_find(list, target); > > -if (opts == NULL) { > > -opts = QTAILQ_FIRST(>head); > > -if (!opts) { > > -return; > > -} > > -} > > - > > user = qemu_opt_get(opts, "user"); > > if (!user) { > > return; > > @@ -1587,6 +1572,41 @@ out: > > } > > } > > > > +static void iscsi_parse_iscsi_option(const char *target, QDict *options) > > +{ > > +QemuOptsList *list; > > +QemuOpts *opts; > > +const char *user, *password, *password_secret; > > + > > +list = qemu_find_opts("iscsi"); > > +if (!list) { > > +return; > > +} > > + > > +opts = qemu_opts_find(list, target); > > +if (opts == NULL) { > > +opts = QTAILQ_FIRST(>head); > > +if (!opts) { > > +return; > > +} > > +} > > + > > +user = qemu_opt_get(opts, "user"); > > +if (user) { > > +qdict_set_default_str(options, "user", user); > > +} > > + > > +password = qemu_opt_get(opts, "password"); > > +if (password) { > > +qdict_set_default_str(options, "password", password); > > +} > > + > > +password_secret = qemu_opt_get(opts, "password-secret"); > > +if (password_secret) { > > +qdict_set_default_str(options, "password-secret", password_secret); > > +} > > +} > > + > > /* > > * We support iscsi url's on the form > > * iscsi://[%@][:]// > > @@ -1625,6 +1645,9 @@ static void iscsi_parse_filename(const char > > *filename, QDict *options, > > qdict_set_default_str(options, "lun", lun_str); > > g_free(lun_str); > > > > +/* User/password from -iscsi take precedence over those from the URL */ > > +iscsi_parse_iscsi_option(iscsi_url->target, options); > > + > > Isn't more natural to let the local option take precedence over the global > one? One would think so, but that's how the option work today, so we can't change it without breaking compatibility. We can only newly define the precedence of the new driver-specific options vs. the existing ones, and there the local driver-specific ones do take precedence. > > if (iscsi_url->user[0] != '\0') { > > qdict_set_default_str(options, "user", iscsi_url->user); > > qdict_set_default_str(options, "password", iscsi_url->passwd); > > @@ -1659,6 +1682,10 @@ static QemuOptsList runtime_opts = { > > .type = QEMU_OPT_STRING, > > }, > > { > > +.name = "password-secret", > > +.type = QEMU_OPT_STRING, > > +}, > > +{ > > I think this added password-secret is not the one looked up in > iscsi_parse_iscsi_option(), which is from -iscsi > (block/iscsi-opts.c:qemu_iscsi_opts). Is this intended? Or does it belong to a > different patch? It is the one that it put into the QDict by iscsi_parse_iscsi_option(), which is supposed to be the value from -iscsi. Why do you think it's not the one? Maybe I'm missing something. Kevin
Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps
Am 17.02.2017 um 13:55 hat Vladimir Sementsov-Ogievskiy geschrieben: > 17.02.2017 15:21, Kevin Wolf wrote: > > Am 17.02.2017 um 13:07 hat Vladimir Sementsov-Ogievskiy geschrieben: > > 16.02.2017 15:47, Kevin Wolf wrote: > > Sorry, this was sent too early. Next attempt... > > Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben: > > Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy > geschrieben: > > Auto loading bitmaps are bitmaps in Qcow2, with the AUTO > flag set. They > are loaded when the image is opened and become > BdrvDirtyBitmaps for the > corresponding drive. > > Extra data in bitmaps is not supported for now. > > [...] > > > hdx.o vhdx-endian.o vhdx-log.o > diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c > new file mode 100644 > index 000..e08e46e > --- /dev/null > +++ b/block/qcow2-bitmap.c > > [...] > > > + > +static int update_header_sync(BlockDriverState *bs) > +{ > +int ret; > + > +ret = qcow2_update_header(bs); > +if (ret < 0) { > +return ret; > +} > + > +/* We don't return bdrv_flush error code. Even if it > fails, write was > + * successful and it is more logical to consider > that header is in the new > + * state than in the old. > + */ > +ret = bdrv_flush(bs); > +if (ret < 0) { > +fprintf(stderr, "Failed to flush qcow2 header"); > +} > > I don't think I can agree with this one. If you don't care > whether the > new header is actually on disk, don't call bdrv_flush(). But if > you do > care, then bdrv_flush() failure means that most likely the new > header > has not made it to the disk, but is just sitting in some volatile > cache. > > And what should be done on bdrv_flush fail? Current solution was > proposed by Max. > > Pass an error up and let the calling operation fail, because we can't > promise that it actually executed correctly. > > > It was my first option. The problem is that resulting state is absolutely > inconsistent - it is not guaranteed to be the old one or new.. However with > current approach it may be inconsistent too, but without an error.. > > So, error message should contain information that all dirty bitmaps may be > inconsistent even if in the image all flags says that bitmaps are OK. After a failing flush, we're always in a state that is hard to predict and where we can make little guarantees about what the image actually contains on disk and what not. The only thing we can really do is to make sure the user sees an error and can respond to it, rather than being silent about a potential corruption. > +return 0; > +} > + > > [...] > > > + > +/* This function returns the number of disk sectors > covered by a single cluster > + * of bitmap data. */ > +static uint64_t disk_sectors_in_bitmap_cluster(const > BDRVQcow2State *s, > + const > BdrvDirtyBitmap *bitmap) > +{ > +uint32_t sector_granularity = > +bdrv_dirty_bitmap_granularity(bitmap) >> > BDRV_SECTOR_BITS; > + > +return (uint64_t)sector_granularity * > (s->cluster_size << 3); > +} > > This has nothing to do with disk sectors, neither of the guest > disk nor > of the host disk. It's just using a funny 512 bytes unit. Is > there a > good reason for introducing this funny unit in new code? > > I'm also not sure what this function calculates, but it's not > what the > comment says. The unit of the result is something like sectors * > bytes, > and even when normalising it to a single base unit, I've never > seen a > use for square bytes so far. > > sector granularity is number of disk sectors, corresponding to one > bit of the dirty bitmap, (disk-sectors/bitmap-bit) > > Please don't use the word "disk sectors", it's misleading because it's > not a sector size of any specific disk. It's best to say just "sectors", > which is a fixed qemu block
Re: [Qemu-block] [PATCH v15 13/25] qcow2: add .bdrv_store_persistent_dirty_bitmaps()
Am 17.02.2017 um 13:24 hat Vladimir Sementsov-Ogievskiy geschrieben: > 16.02.2017 17:08, Kevin Wolf wrote: > >Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>Realize block bitmap storing interface, to allow qcow2 images store > >>persistent bitmaps. > >> > >>Signed-off-by: Vladimir Sementsov-Ogievskiy> >>Reviewed-by: Max Reitz > >>Reviewed-by: John Snow > >Similary to autoload, I think this must be called as part of > >qcow2_inactivate() rather than just in bdrv_close(). > > > >Kevin > > I prefer to store bitmaps once, on the final close of bds, and > remove corresponding BdrvDirtyBitmap in the same point. bdrv_close > is simpler point, I don't want to think about loading/saving bitmap > on each invalidate/inactivate. I don't want to make dependencies > between qcow2 bitmap loading/storing and migration, etc. > > So, my approach was just load bitmap on bdrv_open and store on > bdrv_close, and between these two calls BdrvDirtyBitmap lives its > normal life. For me it looks simpler. I'm not sure about what new > corner cases will come if we change this. You make this sounds like invalidate/inactivate was a very frequent operation. It's not. Essentially it is what you probably consider open/close to be. The semantics is basically that after inactivate (or more precisely, as long as BDRV_O_INACTIVE is set), another process can access and potentially modify the image file. At invalidate, qemu takes ownership again and reloads everything from the image file. Kevin
Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps
17.02.2017 15:21, Kevin Wolf wrote: Am 17.02.2017 um 13:07 hat Vladimir Sementsov-Ogievskiy geschrieben: 16.02.2017 15:47, Kevin Wolf wrote: Sorry, this was sent too early. Next attempt... Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They are loaded when the image is opened and become BdrvDirtyBitmaps for the corresponding drive. Extra data in bitmaps is not supported for now. [...] hdx.o vhdx-endian.o vhdx-log.o diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c new file mode 100644 index 000..e08e46e --- /dev/null +++ b/block/qcow2-bitmap.c [...] + +static int update_header_sync(BlockDriverState *bs) +{ +int ret; + +ret = qcow2_update_header(bs); +if (ret < 0) { +return ret; +} + +/* We don't return bdrv_flush error code. Even if it fails, write was + * successful and it is more logical to consider that header is in the new + * state than in the old. + */ +ret = bdrv_flush(bs); +if (ret < 0) { +fprintf(stderr, "Failed to flush qcow2 header"); +} I don't think I can agree with this one. If you don't care whether the new header is actually on disk, don't call bdrv_flush(). But if you do care, then bdrv_flush() failure means that most likely the new header has not made it to the disk, but is just sitting in some volatile cache. And what should be done on bdrv_flush fail? Current solution was proposed by Max. Pass an error up and let the calling operation fail, because we can't promise that it actually executed correctly. It was my first option. The problem is that resulting state is absolutely inconsistent - it is not guaranteed to be the old one or new.. However with current approach it may be inconsistent too, but without an error.. So, error message should contain information that all dirty bitmaps may be inconsistent even if in the image all flags says that bitmaps are OK. +return 0; +} + [...] + +/* This function returns the number of disk sectors covered by a single cluster + * of bitmap data. */ +static uint64_t disk_sectors_in_bitmap_cluster(const BDRVQcow2State *s, + const BdrvDirtyBitmap *bitmap) +{ +uint32_t sector_granularity = +bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS; + +return (uint64_t)sector_granularity * (s->cluster_size << 3); +} This has nothing to do with disk sectors, neither of the guest disk nor of the host disk. It's just using a funny 512 bytes unit. Is there a good reason for introducing this funny unit in new code? I'm also not sure what this function calculates, but it's not what the comment says. The unit of the result is something like sectors * bytes, and even when normalising it to a single base unit, I've never seen a use for square bytes so far. sector granularity is number of disk sectors, corresponding to one bit of the dirty bitmap, (disk-sectors/bitmap-bit) Please don't use the word "disk sectors", it's misleading because it's not a sector size of any specific disk. It's best to say just "sectors", which is a fixed qemu block layer unit of 512 bytes. cluster_size << 3 is a number of bits in one cluster, (bitmap-bit) so, we have sector_granularity (disk-sector/bitmap-bit) * (bitmapbit) = some disk sectors, corresponding to one cluster of bitmap data. Aha! I completely misunderstood what a bitmap cluster is supposed to be. I expected it to be a chunk of bitmap data whose size corresponds to the bitmap granularity, whereas it's really about qcow2 clusters. I wonder if we can rephrase the comment to make this more obvious. Maybe "...covered by a single qcow2 cluster containing bitmap data"? And the function could be called sectors_covered_by_bitmap_cluster() or something. No problem with that, I'm always for better wording) s/containing/of/ looks better for me, as it shouldbe covered by bitmap data, not by cluster. Kevin -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] [PATCH] block, migration: Use qemu_madvise inplace of madvise
On Fri 17 Feb 2017 01:30:09 PM CET, Pankaj Gupta wrote: > I think 'posix_madvise' was added for systems which didnot have > 'madvise' [...] For the systems which don't have madvise call > 'posix_madvise' is called which as per discussion is not right thing > for 'DONTNEED' option. It will not give desired results. > > Either we have to find right alternative or else it is already broken > for systems which don't support madvise. Do you have an example of a call that is currently broken in the QEMU code? Berto
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: > 17.02.2017 15:09, Kevin Wolf wrote: > >Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>16.02.2017 14:49, Kevin Wolf wrote: > >>>Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >Auto loading bitmaps are bitmaps stored in the disk image, which should > >be loaded when the image is opened and become BdrvDirtyBitmaps for the > >corresponding drive. > > > >Signed-off-by: Vladimir Sementsov-Ogievskiy> >Reviewed-by: John Snow > >Reviewed-by: Max Reitz > Why do we need a new BlockDriver callback and special code for it in > bdrv_open_common()? The callback is only ever called immediately after > .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally > in their .bdrv_open implementation? Even more so because qcow2 is the > only driver that supports this callback. > >>>Actually, don't we have to call this in qcow2_invalidate_cache()? > >>>Currently, I think, after a migration, the autoload bitmaps aren't > >>>loaded. > >>> > >>>By moving the qcow2_load_autoloading_dirty_bitmaps() call to > >>>qcow2_open(), this would be fixed. > >>> > >>>Kevin > >>Bitmap should not be reloaded on any intermediate qcow2-open's, > >>reopens, etc. It should be loaded once, on bdrv_open, to not create > >>extra collisions (between in-memory bitmap and it's stored version). > >>That was the idea. > >> > >>For bitmaps migration there are separate series, we shouldn't load > >>bitmap from file on migration, as it's version in the file is > >>outdated. > >That's not what your series is doing, though. It loads the bitmaps when > > Actually, they will not be loaded as they will have IN_USE flag. > > >migration starts and doesn't reload then when migration completes, even > >though they are stale. Migration with shared storage would just work > >without an extra series if you did these things in the correct places. > > > >As a reminder, this is how migration with shared storage works (or > >should work with your series): > > > >1. Start destination qemu instance. This calls bdrv_open() with > >BDRV_O_INACTIVE. We can read in some metadata, though we don't need > >much more than the image size at this point. Writing to the image is > >still impossible. > > > >2. Start migration on the source, while the VM is still writing to the > >image, rendering the cached metadata from step 1 stale. > > > >3. Migration completes: > > > > a. Stop the VM > > > > b. Inactivate all images in the source qemu. This is where all > >metadata needs to be written back to the image file, including > >bitmaps. No writes to the image are possible after this point > >because BDRV_O_INACTIVE is set. > > > > c. Invalidate the caches in the destination qemu, i.e. reload > >everything from the file that could have changed since step 1, > >including bitmaps. BDRV_O_INACTIVE is cleared, making the image > >ready for writes. > > > > d. Resume the VM on the destination > > > >4. Exit the source qemu process, which involves bdrv_close(). Note that > >at this point, no writing to the image file is possible any more, > >it's the destination qemu process that own the image file now. > > > >Your series loads and stores bitmaps in steps 1 and 4. This means that > > Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and > it is checked), nothing is stored. > > >they are stale on the destination when migration completes, and that > >bdrv_close() wants to write to an image file that it doesn't own any > >more, which will cause an assertion failure. If you instead move things > >to steps 3b and 3c, it will just work. > > Hmm, I understand the idea.. But this will interfere with postcopy > bitmap migration. So if we really need this, there should be some > additional control flags or capabilities.. The problem of your > approach is that bitmap actually migrated in the short state when > source and destination are stopped, it may take time, as bitmaps may > be large. You can always add optimisations, but this is the basic lifecycle process of block devices in qemu, so it would be good to adhere to it. So far there are no other pieces of information that are ignored in bdrv_invalidate()/bdrv_inactivate() and instead only handled in bdrv_open()/bdrv_close(). It's a matter of consistency, too. And not having to add special cases for specific features in the generic bdrv_open()/close() paths is a big plus for me anyway. Kevin
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
17.02.2017 15:09, Kevin Wolf wrote: Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: 16.02.2017 14:49, Kevin Wolf wrote: Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: Auto loading bitmaps are bitmaps stored in the disk image, which should be loaded when the image is opened and become BdrvDirtyBitmaps for the corresponding drive. Signed-off-by: Vladimir Sementsov-OgievskiyReviewed-by: John Snow Reviewed-by: Max Reitz Why do we need a new BlockDriver callback and special code for it in bdrv_open_common()? The callback is only ever called immediately after .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally in their .bdrv_open implementation? Even more so because qcow2 is the only driver that supports this callback. Actually, don't we have to call this in qcow2_invalidate_cache()? Currently, I think, after a migration, the autoload bitmaps aren't loaded. By moving the qcow2_load_autoloading_dirty_bitmaps() call to qcow2_open(), this would be fixed. Kevin Bitmap should not be reloaded on any intermediate qcow2-open's, reopens, etc. It should be loaded once, on bdrv_open, to not create extra collisions (between in-memory bitmap and it's stored version). That was the idea. For bitmaps migration there are separate series, we shouldn't load bitmap from file on migration, as it's version in the file is outdated. That's not what your series is doing, though. It loads the bitmaps when Actually, they will not be loaded as they will have IN_USE flag. migration starts and doesn't reload then when migration completes, even though they are stale. Migration with shared storage would just work without an extra series if you did these things in the correct places. As a reminder, this is how migration with shared storage works (or should work with your series): 1. Start destination qemu instance. This calls bdrv_open() with BDRV_O_INACTIVE. We can read in some metadata, though we don't need much more than the image size at this point. Writing to the image is still impossible. 2. Start migration on the source, while the VM is still writing to the image, rendering the cached metadata from step 1 stale. 3. Migration completes: a. Stop the VM b. Inactivate all images in the source qemu. This is where all metadata needs to be written back to the image file, including bitmaps. No writes to the image are possible after this point because BDRV_O_INACTIVE is set. c. Invalidate the caches in the destination qemu, i.e. reload everything from the file that could have changed since step 1, including bitmaps. BDRV_O_INACTIVE is cleared, making the image ready for writes. d. Resume the VM on the destination 4. Exit the source qemu process, which involves bdrv_close(). Note that at this point, no writing to the image file is possible any more, it's the destination qemu process that own the image file now. Your series loads and stores bitmaps in steps 1 and 4. This means that Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and it is checked), nothing is stored. they are stale on the destination when migration completes, and that bdrv_close() wants to write to an image file that it doesn't own any more, which will cause an assertion failure. If you instead move things to steps 3b and 3c, it will just work. Hmm, I understand the idea.. But this will interfere with postcopy bitmap migration. So if we really need this, there should be some additional control flags or capabilities.. The problem of your approach is that bitmap actually migrated in the short state when source and destination are stopped, it may take time, as bitmaps may be large. Kevin -- Best regards, Vladimir
Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
On Fri 17 Feb 2017 12:30:28 PM CET, Pankaj Gupta wrote: >> > To maintain consistency at all the places use qemu_madvise wrapper >> > inplace of madvise call. >> >> > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED); >> > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); >> >> Those two calls are not equivalent, please see commit >> 2f2c8d6b371cfc6689affb0b7e for an explanation. > I don't understand why '2f2c8d6b371cfc6689affb0b7e' explicitly changed > for :"#ifdef CONFIG_LINUX" I think its better to write generic > function maybe in a wrapper then to conditionally set something at > different places. The problem with qemu_madvise(QEMU_MADV_DONTNEED) is that it can mean different things depending on the platform: posix_madvise(POSIX_MADV_DONTNEED) madvise(MADV_DONTNEED) The first call is standard but it doesn't do what we need, so we cannot use it. The second call -- madvise(MADV_DONTNEED) -- is not standard, and it doesn't do the same in all platforms. The only platform in which it does what we need is Linux, hence the #ifdef CONFIG_LINUX and #if defined(__linux__) that you see in the code. I agree with David's comment that maybe it's better to remove QEMU_MADV_DONTNEED altogether since it's not reliable. Berto
Re: [Qemu-block] [Qemu-devel] [PATCH] block, migration: Use qemu_madvise inplace of madvise
> > * Pankaj Gupta (pagu...@redhat.com) wrote: > > > > Thanks for your comments. I have below query. > > > > > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > > > > To maintain consistency at all the places use qemu_madvise wrapper > > > > inplace of madvise call. > > > > > > > if (length > 0) { > > > > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > > > > +qemu_madvise((uint8_t *) t + offset, length, > > > > QEMU_MADV_DONTNEED); > > > > > > This was changed two months ago from qemu_madvise() to madvise(), is > > > there any reason why you want to revert that change? Those two calls are > > > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an > > > explanation. > > > > > > > -if (madvise(start, length, MADV_DONTNEED)) { > > > > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > > > > error_report("%s MADV_DONTNEED: %s", __func__, > > > > strerror(errno)); > > > > I checked history of only change related to 'postcopy'. > > > > For my linux machine: > > > > ./config-host.mak > > > > CONFIG_MADVISE=y > > CONFIG_POSIX_MADVISE=y > > > > As both these options are set for Linux, every time we call call > > 'qemu_madvise' ==>"madvise(addr, len, advice);" will > > be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' > > explicitly changed for :"#ifdef CONFIG_LINUX" > > I think its better to write generic function maybe in a wrapper then to > > conditionally set something at different places. > > No; the problem is that the behaviours are different. > You're right that the current build on Linux defines MADVISE and thus we are > safe because qemu_madvise > takes teh CONFIG_MADVISE/madvise route - but we need to be explicit that it's > only > the madvise() route that's safe, not any of the calls implemented by > qemu_madvise, because if in the future someone was to rearrange qemu_madvise > to prefer posix_madvise postcopy would break in a very subtle way. Agree. We can add comment explaining this? > > IMHO it might even be better to remove the definition of QEMU_MADV_DONTNEED > altogether > and make a name that wasn't ambiguous between the two, since the posix > definition is > so different. I think 'posix_madvise' was added for systems which didnot have 'madvise'. If I look at makefile, first we check what all calls are available and then set config option accordingly. We give 'madvise' precedence over 'posix_madvise' if both are present. For the systems which don't have madvise call 'posix_madvise' is called which as per discussion is not right thing for 'DONTNEED' option. It will not give desired results. Either we have to find right alternative or else it is already broken for systems which don't support madvise. > > Dave > > > int qemu_madvise(void *addr, size_t len, int advice) > > { > > if (advice == QEMU_MADV_INVALID) { > > errno = EINVAL; > > return -1; > > } > > #if defined(CONFIG_MADVISE) > > return madvise(addr, len, advice); > > #elif defined(CONFIG_POSIX_MADVISE) > > return posix_madvise(addr, len, advice); > > #else > > errno = EINVAL; > > return -1; > > #endif > > } > > > > > > > > And this is the same case. > > > > > > Berto > > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > >
Re: [Qemu-block] [PATCH v15 13/25] qcow2: add .bdrv_store_persistent_dirty_bitmaps()
16.02.2017 17:08, Kevin Wolf wrote: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: Realize block bitmap storing interface, to allow qcow2 images store persistent bitmaps. Signed-off-by: Vladimir Sementsov-OgievskiyReviewed-by: Max Reitz Reviewed-by: John Snow Similary to autoload, I think this must be called as part of qcow2_inactivate() rather than just in bdrv_close(). Kevin I prefer to store bitmaps once, on the final close of bds, and remove corresponding BdrvDirtyBitmap in the same point. bdrv_close is simpler point, I don't want to think about loading/saving bitmap on each invalidate/inactivate. I don't want to make dependencies between qcow2 bitmap loading/storing and migration, etc. So, my approach was just load bitmap on bdrv_open and store on bdrv_close, and between these two calls BdrvDirtyBitmap lives its normal life. For me it looks simpler. I'm not sure about what new corner cases will come if we change this. -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps
Am 17.02.2017 um 13:07 hat Vladimir Sementsov-Ogievskiy geschrieben: > 16.02.2017 15:47, Kevin Wolf wrote: > >Sorry, this was sent too early. Next attempt... > > > >Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben: > >>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They > >>>are loaded when the image is opened and become BdrvDirtyBitmaps for the > >>>corresponding drive. > >>> > >>>Extra data in bitmaps is not supported for now. > > [...] > > >>>hdx.o vhdx-endian.o vhdx-log.o > >>>diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c > >>>new file mode 100644 > >>>index 000..e08e46e > >>>--- /dev/null > >>>+++ b/block/qcow2-bitmap.c > > [...] > > >>>+ > >>>+static int update_header_sync(BlockDriverState *bs) > >>>+{ > >>>+int ret; > >>>+ > >>>+ret = qcow2_update_header(bs); > >>>+if (ret < 0) { > >>>+return ret; > >>>+} > >>>+ > >>>+/* We don't return bdrv_flush error code. Even if it fails, write was > >>>+ * successful and it is more logical to consider that header is in > >>>the new > >>>+ * state than in the old. > >>>+ */ > >>>+ret = bdrv_flush(bs); > >>>+if (ret < 0) { > >>>+fprintf(stderr, "Failed to flush qcow2 header"); > >>>+} > >I don't think I can agree with this one. If you don't care whether the > >new header is actually on disk, don't call bdrv_flush(). But if you do > >care, then bdrv_flush() failure means that most likely the new header > >has not made it to the disk, but is just sitting in some volatile cache. > > And what should be done on bdrv_flush fail? Current solution was > proposed by Max. Pass an error up and let the calling operation fail, because we can't promise that it actually executed correctly. > > > >>>+return 0; > >>>+} > >>>+ > > [...] > > >>>+ > >>>+/* This function returns the number of disk sectors covered by a single > >>>cluster > >>>+ * of bitmap data. */ > >>>+static uint64_t disk_sectors_in_bitmap_cluster(const BDRVQcow2State *s, > >>>+ const BdrvDirtyBitmap > >>>*bitmap) > >>>+{ > >>>+uint32_t sector_granularity = > >>>+bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS; > >>>+ > >>>+return (uint64_t)sector_granularity * (s->cluster_size << 3); > >>>+} > >This has nothing to do with disk sectors, neither of the guest disk nor > >of the host disk. It's just using a funny 512 bytes unit. Is there a > >good reason for introducing this funny unit in new code? > > > >I'm also not sure what this function calculates, but it's not what the > >comment says. The unit of the result is something like sectors * bytes, > >and even when normalising it to a single base unit, I've never seen a > >use for square bytes so far. > > sector granularity is number of disk sectors, corresponding to one > bit of the dirty bitmap, (disk-sectors/bitmap-bit) Please don't use the word "disk sectors", it's misleading because it's not a sector size of any specific disk. It's best to say just "sectors", which is a fixed qemu block layer unit of 512 bytes. > cluster_size << 3 is a number of bits in one cluster, (bitmap-bit) > > so, we have > sector_granularity (disk-sector/bitmap-bit) * > (bitmapbit) = some disk sectors, corresponding to one cluster of > bitmap data. Aha! I completely misunderstood what a bitmap cluster is supposed to be. I expected it to be a chunk of bitmap data whose size corresponds to the bitmap granularity, whereas it's really about qcow2 clusters. I wonder if we can rephrase the comment to make this more obvious. Maybe "...covered by a single qcow2 cluster containing bitmap data"? And the function could be called sectors_covered_by_bitmap_cluster() or something. Kevin
Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name
On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote: > 16.02.2017 16:48, Fam Zheng wrote: > > On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote: > > > When testing migration, auto-generated by qemu node-names differs in > > > source and destination qemu and migration fails. After this patch, > > > auto-generated by iotest nodenames will be the same. > > What should be done in libvirt to make sure the node-names are matching > > correctly at both sides? > > Hmm, just set node names appropriately? But I think the problem is that node names are not configurable from libvirt today, and then the migration will fail. Should the device name take precedence in the code, to make it easier? Fam
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > 16.02.2017 14:49, Kevin Wolf wrote: > >Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > >>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>Auto loading bitmaps are bitmaps stored in the disk image, which should > >>>be loaded when the image is opened and become BdrvDirtyBitmaps for the > >>>corresponding drive. > >>> > >>>Signed-off-by: Vladimir Sementsov-Ogievskiy> >>>Reviewed-by: John Snow > >>>Reviewed-by: Max Reitz > >>Why do we need a new BlockDriver callback and special code for it in > >>bdrv_open_common()? The callback is only ever called immediately after > >>.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally > >>in their .bdrv_open implementation? Even more so because qcow2 is the > >>only driver that supports this callback. > >Actually, don't we have to call this in qcow2_invalidate_cache()? > >Currently, I think, after a migration, the autoload bitmaps aren't > >loaded. > > > >By moving the qcow2_load_autoloading_dirty_bitmaps() call to > >qcow2_open(), this would be fixed. > > > >Kevin > > Bitmap should not be reloaded on any intermediate qcow2-open's, > reopens, etc. It should be loaded once, on bdrv_open, to not create > extra collisions (between in-memory bitmap and it's stored version). > That was the idea. > > For bitmaps migration there are separate series, we shouldn't load > bitmap from file on migration, as it's version in the file is > outdated. That's not what your series is doing, though. It loads the bitmaps when migration starts and doesn't reload then when migration completes, even though they are stale. Migration with shared storage would just work without an extra series if you did these things in the correct places. As a reminder, this is how migration with shared storage works (or should work with your series): 1. Start destination qemu instance. This calls bdrv_open() with BDRV_O_INACTIVE. We can read in some metadata, though we don't need much more than the image size at this point. Writing to the image is still impossible. 2. Start migration on the source, while the VM is still writing to the image, rendering the cached metadata from step 1 stale. 3. Migration completes: a. Stop the VM b. Inactivate all images in the source qemu. This is where all metadata needs to be written back to the image file, including bitmaps. No writes to the image are possible after this point because BDRV_O_INACTIVE is set. c. Invalidate the caches in the destination qemu, i.e. reload everything from the file that could have changed since step 1, including bitmaps. BDRV_O_INACTIVE is cleared, making the image ready for writes. d. Resume the VM on the destination 4. Exit the source qemu process, which involves bdrv_close(). Note that at this point, no writing to the image file is possible any more, it's the destination qemu process that own the image file now. Your series loads and stores bitmaps in steps 1 and 4. This means that they are stale on the destination when migration completes, and that bdrv_close() wants to write to an image file that it doesn't own any more, which will cause an assertion failure. If you instead move things to steps 3b and 3c, it will just work. Kevin
Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps
16.02.2017 15:47, Kevin Wolf wrote: Sorry, this was sent too early. Next attempt... Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They are loaded when the image is opened and become BdrvDirtyBitmaps for the corresponding drive. Extra data in bitmaps is not supported for now. [...] hdx.o vhdx-endian.o vhdx-log.o diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c new file mode 100644 index 000..e08e46e --- /dev/null +++ b/block/qcow2-bitmap.c [...] + +static int update_header_sync(BlockDriverState *bs) +{ +int ret; + +ret = qcow2_update_header(bs); +if (ret < 0) { +return ret; +} + +/* We don't return bdrv_flush error code. Even if it fails, write was + * successful and it is more logical to consider that header is in the new + * state than in the old. + */ +ret = bdrv_flush(bs); +if (ret < 0) { +fprintf(stderr, "Failed to flush qcow2 header"); +} I don't think I can agree with this one. If you don't care whether the new header is actually on disk, don't call bdrv_flush(). But if you do care, then bdrv_flush() failure means that most likely the new header has not made it to the disk, but is just sitting in some volatile cache. And what should be done on bdrv_flush fail? Current solution was proposed by Max. +return 0; +} + [...] + +/* This function returns the number of disk sectors covered by a single cluster + * of bitmap data. */ +static uint64_t disk_sectors_in_bitmap_cluster(const BDRVQcow2State *s, + const BdrvDirtyBitmap *bitmap) +{ +uint32_t sector_granularity = +bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS; + +return (uint64_t)sector_granularity * (s->cluster_size << 3); +} This has nothing to do with disk sectors, neither of the guest disk nor of the host disk. It's just using a funny 512 bytes unit. Is there a good reason for introducing this funny unit in new code? I'm also not sure what this function calculates, but it's not what the comment says. The unit of the result is something like sectors * bytes, and even when normalising it to a single base unit, I've never seen a use for square bytes so far. sector granularity is number of disk sectors, corresponding to one bit of the dirty bitmap, (disk-sectors/bitmap-bit) cluster_size << 3 is a number of bits in one cluster, (bitmap-bit) so, we have sector_granularity (disk-sector/bitmap-bit) * (bitmapbit) = some disk sectors, corresponding to one cluster of bitmap data. -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
16.02.2017 14:49, Kevin Wolf wrote: Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: Auto loading bitmaps are bitmaps stored in the disk image, which should be loaded when the image is opened and become BdrvDirtyBitmaps for the corresponding drive. Signed-off-by: Vladimir Sementsov-OgievskiyReviewed-by: John Snow Reviewed-by: Max Reitz Why do we need a new BlockDriver callback and special code for it in bdrv_open_common()? The callback is only ever called immediately after .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally in their .bdrv_open implementation? Even more so because qcow2 is the only driver that supports this callback. Actually, don't we have to call this in qcow2_invalidate_cache()? Currently, I think, after a migration, the autoload bitmaps aren't loaded. By moving the qcow2_load_autoloading_dirty_bitmaps() call to qcow2_open(), this would be fixed. Kevin Bitmap should not be reloaded on any intermediate qcow2-open's, reopens, etc. It should be loaded once, on bdrv_open, to not create extra collisions (between in-memory bitmap and it's stored version). That was the idea. For bitmaps migration there are separate series, we shouldn't load bitmap from file on migration, as it's version in the file is outdated. , -- Best regards, Vladimir
Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
* Pankaj Gupta (pagu...@redhat.com) wrote: > > Thanks for your comments. I have below query. > > > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > > > To maintain consistency at all the places use qemu_madvise wrapper > > > inplace of madvise call. > > > > > if (length > 0) { > > > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > > > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); > > > > This was changed two months ago from qemu_madvise() to madvise(), is > > there any reason why you want to revert that change? Those two calls are > > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an > > explanation. > > > > > -if (madvise(start, length, MADV_DONTNEED)) { > > > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > > > error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); > > I checked history of only change related to 'postcopy'. > > For my linux machine: > > ./config-host.mak > > CONFIG_MADVISE=y > CONFIG_POSIX_MADVISE=y > > As both these options are set for Linux, every time we call call > 'qemu_madvise' ==>"madvise(addr, len, advice);" will > be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' > explicitly changed for :"#ifdef CONFIG_LINUX" > I think its better to write generic function maybe in a wrapper then to > conditionally set something at different places. No; the problem is that the behaviours are different. You're right that the current build on Linux defines MADVISE and thus we are safe because qemu_madvise takes teh CONFIG_MADVISE/madvise route - but we need to be explicit that it's only the madvise() route that's safe, not any of the calls implemented by qemu_madvise, because if in the future someone was to rearrange qemu_madvise to prefer posix_madvise postcopy would break in a very subtle way. IMHO it might even be better to remove the definition of QEMU_MADV_DONTNEED altogether and make a name that wasn't ambiguous between the two, since the posix definition is so different. Dave > int qemu_madvise(void *addr, size_t len, int advice) > { > if (advice == QEMU_MADV_INVALID) { > errno = EINVAL; > return -1; > } > #if defined(CONFIG_MADVISE) > return madvise(addr, len, advice); > #elif defined(CONFIG_POSIX_MADVISE) > return posix_madvise(addr, len, advice); > #else > errno = EINVAL; > return -1; > #endif > } > > > > > And this is the same case. > > > > Berto > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
Thanks for your comments. I have below query. > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > > To maintain consistency at all the places use qemu_madvise wrapper > > inplace of madvise call. > > > if (length > 0) { > > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); > > This was changed two months ago from qemu_madvise() to madvise(), is > there any reason why you want to revert that change? Those two calls are > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an > explanation. > > > -if (madvise(start, length, MADV_DONTNEED)) { > > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > > error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); I checked history of only change related to 'postcopy'. For my linux machine: ./config-host.mak CONFIG_MADVISE=y CONFIG_POSIX_MADVISE=y As both these options are set for Linux, every time we call call 'qemu_madvise' ==>"madvise(addr, len, advice);" will be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' explicitly changed for :"#ifdef CONFIG_LINUX" I think its better to write generic function maybe in a wrapper then to conditionally set something at different places. int qemu_madvise(void *addr, size_t len, int advice) { if (advice == QEMU_MADV_INVALID) { errno = EINVAL; return -1; } #if defined(CONFIG_MADVISE) return madvise(addr, len, advice); #elif defined(CONFIG_POSIX_MADVISE) return posix_madvise(addr, len, advice); #else errno = EINVAL; return -1; #endif } > > And this is the same case. > > Berto >
Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name
16.02.2017 16:48, Fam Zheng wrote: On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote: When testing migration, auto-generated by qemu node-names differs in source and destination qemu and migration fails. After this patch, auto-generated by iotest nodenames will be the same. What should be done in libvirt to make sure the node-names are matching correctly at both sides? Hmm, just set node names appropriately? Signed-off-by: Vladimir Sementsov-OgievskiyReviewed-by: Max Reitz --- tests/qemu-iotests/iotests.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index f5ca4b8..e110c90 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -168,6 +168,8 @@ class VM(qtest.QEMUQtestMachine): options.append('file=%s' % path) options.append('format=%s' % format) options.append('cache=%s' % cachemode) +if 'node-name' not in opts: +options.append('node-name=drivenode%d' % self._num_drives) if opts: options.append(opts) -- 1.8.3.1 -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v15 25/25] qcow2-bitmap: improve check_constraints_on_bitmap
16.02.2017 17:21, Kevin Wolf wrote: Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: Add detailed error messages. Signed-off-by: Vladimir Sementsov-OgievskiyWhy not merge this patch into the one that originally introduced the function? Just to not create extra work for reviewers Kevin -- Best regards, Vladimir
Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > To maintain consistency at all the places use qemu_madvise wrapper > inplace of madvise call. > if (length > 0) { > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); This was changed two months ago from qemu_madvise() to madvise(), is there any reason why you want to revert that change? Those two calls are not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an explanation. > -if (madvise(start, length, MADV_DONTNEED)) { > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); And this is the same case. Berto
Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
* Kevin Wolf (kw...@redhat.com) wrote: > Am 17.02.2017 um 09:06 hat Pankaj Gupta geschrieben: > > To maintain consistency at all the places use qemu_madvise wrapper > > inplace of madvise call. > > > > Signed-off-by: Pankaj Gupta> > Reviewed-by: Kevin Wolf > > Juan/Dave, if one of you can give an Acked-by, I can take this through > my tree. NACK That's wrong; qemu_madvise can end up going through posix_madvise and using POSIX_MADV_DONTNEED, it has different semantics to the madvise(MADV_DONTNEED) and we need the semantics of madvise - i.e. it's guaranteed to throw away the pages, where as posix_madvise *may* throw away the pages if the kernel feels like it. Dave > Kevin -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
Am 17.02.2017 um 09:06 hat Pankaj Gupta geschrieben: > To maintain consistency at all the places use qemu_madvise wrapper > inplace of madvise call. > > Signed-off-by: Pankaj GuptaReviewed-by: Kevin Wolf Juan/Dave, if one of you can give an Acked-by, I can take this through my tree. Kevin
[Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise
To maintain consistency at all the places use qemu_madvise wrapper inplace of madvise call. Signed-off-by: Pankaj Gupta--- block/qcow2-cache.c | 2 +- migration/postcopy-ram.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index 1d25147..4991ca5 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -74,7 +74,7 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c, size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t; size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align); if (length > 0) { -madvise((uint8_t *) t + offset, length, MADV_DONTNEED); +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); } #endif } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index a40dddb..558fec1 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -213,7 +213,7 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start, size_t length) { trace_postcopy_ram_discard_range(start, length); -if (madvise(start, length, MADV_DONTNEED)) { +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); return -1; } -- 2.7.4