Re: [Qemu-block] [Qemu-devel] [PATCH 23/24] util/cutils: Change qemu_strtosz*() from int64_t to uint64_t

2017-02-17 Thread Eric Blake
On 02/14/2017 04:26 AM, Markus Armbruster wrote:
> This will permit its use in parse_option_size().

Progress!  (not that it matters - off_t is a signed value, so you are
STILL limited to 2^63, not 2^64, as the maximum theoretical size storage
that you can target - and it's not like we have that much storage even
accessible)

> 
> Cc: Dr. David Alan Gilbert 
> Cc: Eduardo Habkost  (maintainer:X86)
> Cc: Kevin Wolf  (supporter:Block layer core)
> Cc: Max Reitz  (supporter:Block layer core)
> Cc: qemu-block@nongnu.org (open list:Block layer core)
> Signed-off-by: Markus Armbruster 
> ---

> +++ b/hmp.c
> @@ -1338,7 +1338,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)
>  {
>  const char *param = qdict_get_str(qdict, "parameter");
>  const char *valuestr = qdict_get_str(qdict, "value");
> -int64_t valuebw = 0;
> +uint64_t valuebw = 0;
>  long valueint = 0;
>  Error *err = NULL;
>  bool use_int_value = false;
> @@ -1379,7 +1379,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)
>  case MIGRATION_PARAMETER_MAX_BANDWIDTH:
>  p.has_max_bandwidth = true;
>  ret = qemu_strtosz_mebi(valuestr, NULL, );
> -if (ret < 0 || (size_t)valuebw != valuebw) {
> +if (ret < 0 || valuebw > INT64_MAX
> +|| (size_t)valuebw != valuebw) {

I don't know if this looks any better as:

if (ret < 0 || valuebw > MIN(INT64_MAX, SIZE_MAX))

so don't bother changing it if you think that's ugly.

> +++ b/qapi/opts-visitor.c
> @@ -481,7 +481,6 @@ opts_type_size(Visitor *v, const char *name, uint64_t 
> *obj, Error **errp)
>  {
>  OptsVisitor *ov = to_ov(v);
>  const QemuOpt *opt;
> -int64_t val;
>  int err;
>  
>  opt = lookup_scalar(ov, name, errp);
> @@ -489,14 +488,13 @@ opts_type_size(Visitor *v, const char *name, uint64_t 
> *obj, Error **errp)
>  return;
>  }
>  
> -err = qemu_strtosz(opt->str ? opt->str : "", NULL, );
> +err = qemu_strtosz(opt->str ? opt->str : "", NULL, obj);
>  if (err < 0) {
>  error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name,
> -   "a size value representible as a non-negative int64");

Nice - you're killing the unusual variant spelling of representable.

> +++ b/util/cutils.c
> @@ -207,7 +207,7 @@ static int64_t suffix_mul(char suffix, int64_t unit)
>   */
>  static int do_strtosz(const char *nptr, char **end,
>const char default_suffix, int64_t unit,
> -  int64_t *result)
> +  uint64_t *result)
>  {
>  int retval;
>  char *endptr;
> @@ -237,7 +237,11 @@ static int do_strtosz(const char *nptr, char **end,
>  retval = -EINVAL;
>  goto out;
>  }
> -if ((val * mul >= INT64_MAX) || val < 0) {
> +/*
> + * Values >= 0xfc00 overflow uint64_t after their trip
> + * through double (53 bits of precision).
> + */
> +if ((val * mul >= 0xfc00) || val < 0) {

I guess there's not any simpler expression using INT64_MAX and
operations (including casts to double and int64_t) that would reliably
produce the same constant that you just open-coded here.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 22/24] util/cutils: Return qemu_strtosz*() error and value separately

2017-02-17 Thread Eric Blake
On 02/14/2017 04:26 AM, Markus Armbruster wrote:
> This makes qemu_strtosz(), qemu_strtosz_mebi() and
> qemu_strtosz_metric() similar to qemu_strtoi64(), except negative
> values are rejected.

Yay. It also opens the door to allowing them to return an unsigned 64
bit value ;)

> 
> Cc: Dr. David Alan Gilbert 
> Cc: Eduardo Habkost  (maintainer:X86)
> Cc: Kevin Wolf  (supporter:Block layer core)
> Cc: Max Reitz  (supporter:Block layer core)
> Cc: qemu-block@nongnu.org (open list:Block layer core)
> Signed-off-by: Markus Armbruster 
> ---
> @@ -1378,8 +1378,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)
>  break;
>  case MIGRATION_PARAMETER_MAX_BANDWIDTH:
>  p.has_max_bandwidth = true;
> -valuebw = qemu_strtosz_mebi(valuestr, NULL);
> -if (valuebw < 0 || (size_t)valuebw != valuebw) {
> +ret = qemu_strtosz_mebi(valuestr, NULL, );
> +if (ret < 0 || (size_t)valuebw != valuebw) {

Question: do we want a wrapper that takes a maximum, as in:

ret = qemu_strtosz_capped(valuestr, NULL, 'M', SIZE_MAX, );

so that the caller doesn't have to check (size_t)valuebw != valuebw ?

But not essential, so I can live with what you did here.


> +++ b/qemu-img.c
> @@ -370,10 +370,14 @@ static int add_old_style_options(const char *fmt, 
> QemuOpts *opts,
>  
>  static int64_t cvtnum(const char *s)

> +++ b/qemu-io-cmds.c
> @@ -137,10 +137,14 @@ static char **breakline(char *input, int *count)
>  
>  static int64_t cvtnum(const char *s)

May be some fallout based on rebase churn from earlier patch reviews,
but nothing too serious to prevent you from adding:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev

2017-02-17 Thread Jeff Cody
On Fri, Feb 17, 2017 at 03:42:52PM -0600, Eric Blake wrote:
> On 02/17/2017 03:37 PM, Jeff Cody wrote:
> > On Fri, Feb 17, 2017 at 05:39:01PM +0100, Peter Lieven wrote:
> >> if the passed qiov contains exactly one iov we can
> >> pass the buffer directly.
> >>
> >> Signed-off-by: Peter Lieven 
> >> ---
> >>  block/nfs.c | 23 ---
> >>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/block/nfs.c b/block/nfs.c
> >> index ab5dcc2..bb4b75f 100644
> >> --- a/block/nfs.c
> >> +++ b/block/nfs.c
> >> @@ -295,20 +295,27 @@ static int coroutine_fn 
> >> nfs_co_pwritev(BlockDriverState *bs, uint64_t offset,
> >>  NFSClient *client = bs->opaque;
> >>  NFSRPC task;
> >>  char *buf = NULL;
> >> +bool my_buffer = false;
> > 
> > g_free() is a nop if buf is NULL, so there is no need for the my_buffer
> > variable & check.
> 
> Umm, yes there is:
> 
> >> +if (iov->niov != 1) {
> >> +buf = g_try_malloc(bytes);
> >> +if (bytes && buf == NULL) {
> >> +return -ENOMEM;
> >> +}
> >> +qemu_iovec_to_buf(iov, 0, buf, bytes);
> >> +my_buffer = true;
> >> +} else {
> >> +buf = iov->iov[0].iov_base;
> 
> If we took the else branch, then we definitely do not want to be calling
> g_free(buf).

Doh!

> 
> >>  }
> >>  
> >> -qemu_iovec_to_buf(iov, 0, buf, bytes);
> >> -
> >>  if (nfs_pwrite_async(client->context, client->fh,
> >>   offset, bytes, buf,
> >>   nfs_co_generic_cb, ) != 0) {
> >> -g_free(buf);
> >> +if (my_buffer) {
> >> +g_free(buf);
> >> +}
> 
> It looks correct as-is to me.

Indeed.

Reviewed-by: Jeff Cody 





Re: [Qemu-block] [Qemu-devel] [PATCH v2 6/7] iscsi: Add blockdev-add support

2017-02-17 Thread Jeff Cody
On Fri, Feb 17, 2017 at 03:40:21PM -0600, Eric Blake wrote:
> On 01/25/2017 11:42 AM, Jeff Cody wrote:
> > From: Kevin Wolf 
> > 
> > This adds blockdev-add support for iscsi devices.
> > 
> > Reviewed-by: Daniel P. Berrange 
> > Signed-off-by: Kevin Wolf 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/iscsi.c| 14 ++
> >  qapi/block-core.json | 74 
> > 
> >  2 files changed, 78 insertions(+), 10 deletions(-)
> 
> > +++ b/qapi/block-core.json
> > @@ -2116,10 +2116,10 @@
> >  { 'enum': 'BlockdevDriver',
> >'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
> >  'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
> > -'host_device', 'http', 'https', 'luks', 'nbd', 'nfs', 
> > 'null-aio',
> > -'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 
> > 'raw',
> > -'replication', 'ssh', 'vdi', 'vhdx', 'vmdk', 'vpc',
> > -'vvfat' ] }
> > +'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
> > +'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
> > +'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk',
> > +'vpc', 'vvfat' ] }
> 
> Are we missing a since 2.9 documentation designation here?
> 

Yes, thanks for catching that.  I've applied it to my branch already, but I
will go ahead and fix this and rebase with the following squashed into the
patch:

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f00fc9d..5f82d35 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2110,6 +2110,7 @@
 # @nfs: Since 2.8
 # @replication: Since 2.8
 # @ssh: Since 2.8
+# @iscsi: Since 2.9
 #
 # Since: 2.0
 ##




Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev

2017-02-17 Thread Eric Blake
On 02/17/2017 03:37 PM, Jeff Cody wrote:
> On Fri, Feb 17, 2017 at 05:39:01PM +0100, Peter Lieven wrote:
>> if the passed qiov contains exactly one iov we can
>> pass the buffer directly.
>>
>> Signed-off-by: Peter Lieven 
>> ---
>>  block/nfs.c | 23 ---
>>  1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/nfs.c b/block/nfs.c
>> index ab5dcc2..bb4b75f 100644
>> --- a/block/nfs.c
>> +++ b/block/nfs.c
>> @@ -295,20 +295,27 @@ static int coroutine_fn 
>> nfs_co_pwritev(BlockDriverState *bs, uint64_t offset,
>>  NFSClient *client = bs->opaque;
>>  NFSRPC task;
>>  char *buf = NULL;
>> +bool my_buffer = false;
> 
> g_free() is a nop if buf is NULL, so there is no need for the my_buffer
> variable & check.

Umm, yes there is:

>> +if (iov->niov != 1) {
>> +buf = g_try_malloc(bytes);
>> +if (bytes && buf == NULL) {
>> +return -ENOMEM;
>> +}
>> +qemu_iovec_to_buf(iov, 0, buf, bytes);
>> +my_buffer = true;
>> +} else {
>> +buf = iov->iov[0].iov_base;

If we took the else branch, then we definitely do not want to be calling
g_free(buf).

>>  }
>>  
>> -qemu_iovec_to_buf(iov, 0, buf, bytes);
>> -
>>  if (nfs_pwrite_async(client->context, client->fh,
>>   offset, bytes, buf,
>>   nfs_co_generic_cb, ) != 0) {
>> -g_free(buf);
>> +if (my_buffer) {
>> +g_free(buf);
>> +}

It looks correct as-is to me.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 6/7] iscsi: Add blockdev-add support

2017-02-17 Thread Eric Blake
On 01/25/2017 11:42 AM, Jeff Cody wrote:
> From: Kevin Wolf 
> 
> This adds blockdev-add support for iscsi devices.
> 
> Reviewed-by: Daniel P. Berrange 
> Signed-off-by: Kevin Wolf 
> Signed-off-by: Jeff Cody 
> ---
>  block/iscsi.c| 14 ++
>  qapi/block-core.json | 74 
> 
>  2 files changed, 78 insertions(+), 10 deletions(-)

> +++ b/qapi/block-core.json
> @@ -2116,10 +2116,10 @@
>  { 'enum': 'BlockdevDriver',
>'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
>  'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
> -'host_device', 'http', 'https', 'luks', 'nbd', 'nfs', 'null-aio',
> -'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
> -'replication', 'ssh', 'vdi', 'vhdx', 'vmdk', 'vpc',
> -'vvfat' ] }
> +'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
> +'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
> +'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk',
> +'vpc', 'vvfat' ] }

Are we missing a since 2.9 documentation designation here?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev

2017-02-17 Thread Jeff Cody
On Fri, Feb 17, 2017 at 05:39:01PM +0100, Peter Lieven wrote:
> if the passed qiov contains exactly one iov we can
> pass the buffer directly.
> 
> Signed-off-by: Peter Lieven 
> ---
>  block/nfs.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/block/nfs.c b/block/nfs.c
> index ab5dcc2..bb4b75f 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -295,20 +295,27 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
> *bs, uint64_t offset,
>  NFSClient *client = bs->opaque;
>  NFSRPC task;
>  char *buf = NULL;
> +bool my_buffer = false;

g_free() is a nop if buf is NULL, so there is no need for the my_buffer
variable & check.

>  
>  nfs_co_init_task(bs, );
>  
> -buf = g_try_malloc(bytes);
> -if (bytes && buf == NULL) {
> -return -ENOMEM;
> +if (iov->niov != 1) {
> +buf = g_try_malloc(bytes);
> +if (bytes && buf == NULL) {
> +return -ENOMEM;
> +}
> +qemu_iovec_to_buf(iov, 0, buf, bytes);
> +my_buffer = true;
> +} else {
> +buf = iov->iov[0].iov_base;
>  }
>  
> -qemu_iovec_to_buf(iov, 0, buf, bytes);
> -
>  if (nfs_pwrite_async(client->context, client->fh,
>   offset, bytes, buf,
>   nfs_co_generic_cb, ) != 0) {
> -g_free(buf);
> +if (my_buffer) {
> +g_free(buf);
> +}
>  return -ENOMEM;
>  }
>  
> @@ -317,7 +324,9 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
> *bs, uint64_t offset,
>  qemu_coroutine_yield();
>  }
>  
> -g_free(buf);
> +if (my_buffer) {
> +g_free(buf);
> +}
>  
>  if (task.ret != bytes) {
>  return task.ret < 0 ? task.ret : -EIO;
> -- 
> 1.9.1
> 



Re: [Qemu-block] [PATCH 1/2] block/nfs: convert to preadv / pwritev

2017-02-17 Thread Jeff Cody
On Fri, Feb 17, 2017 at 05:39:00PM +0100, Peter Lieven wrote:
> Signed-off-by: Peter Lieven 
> ---
>  block/nfs.c | 33 +++--
>  1 file changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/block/nfs.c b/block/nfs.c
> index 689eaa7..ab5dcc2 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -256,9 +256,9 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void 
> *data,
>  nfs_co_generic_bh_cb, task);
>  }
>  
> -static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
> - int64_t sector_num, int nb_sectors,
> - QEMUIOVector *iov)
> +static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset,
> +  uint64_t bytes, QEMUIOVector *iov,
> +  int flags)
>  {
>  NFSClient *client = bs->opaque;
>  NFSRPC task;
> @@ -267,9 +267,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
>  task.iov = iov;
>  
>  if (nfs_pread_async(client->context, client->fh,
> -sector_num * BDRV_SECTOR_SIZE,
> -nb_sectors * BDRV_SECTOR_SIZE,
> -nfs_co_generic_cb, ) != 0) {
> +offset, bytes, nfs_co_generic_cb, ) != 0) {
>  return -ENOMEM;
>  }
>  
> @@ -290,9 +288,9 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
>  return 0;
>  }
>  
> -static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
> -int64_t sector_num, int nb_sectors,
> -QEMUIOVector *iov)
> +static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset,
> +   uint64_t bytes, QEMUIOVector *iov,
> +   int flags)
>  {
>  NFSClient *client = bs->opaque;
>  NFSRPC task;
> @@ -300,17 +298,16 @@ static int coroutine_fn nfs_co_writev(BlockDriverState 
> *bs,
>  
>  nfs_co_init_task(bs, );
>  
> -buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE);
> -if (nb_sectors && buf == NULL) {
> +buf = g_try_malloc(bytes);
> +if (bytes && buf == NULL) {
>  return -ENOMEM;
>  }
>  
> -qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
> +qemu_iovec_to_buf(iov, 0, buf, bytes);
>  
>  if (nfs_pwrite_async(client->context, client->fh,
> - sector_num * BDRV_SECTOR_SIZE,
> - nb_sectors * BDRV_SECTOR_SIZE,
> - buf, nfs_co_generic_cb, ) != 0) {
> + offset, bytes, buf,
> + nfs_co_generic_cb, ) != 0) {
>  g_free(buf);
>  return -ENOMEM;
>  }
> @@ -322,7 +319,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState 
> *bs,
>  
>  g_free(buf);
>  
> -if (task.ret != nb_sectors * BDRV_SECTOR_SIZE) {
> +if (task.ret != bytes) {
>  return task.ret < 0 ? task.ret : -EIO;
>  }
>  
> @@ -856,8 +853,8 @@ static BlockDriver bdrv_nfs = {
>  .bdrv_create= nfs_file_create,
>  .bdrv_reopen_prepare= nfs_reopen_prepare,
>  
> -.bdrv_co_readv  = nfs_co_readv,
> -.bdrv_co_writev = nfs_co_writev,
> +.bdrv_co_preadv = nfs_co_preadv,
> +.bdrv_co_pwritev= nfs_co_pwritev,
>  .bdrv_co_flush_to_disk  = nfs_co_flush,
>  
>  .bdrv_detach_aio_context= nfs_detach_aio_context,
> -- 
> 1.9.1
>

Reviewed-by: Jeff Cody 



Re: [Qemu-block] [Qemu-devel] [PATCH 21/24] util/cutils: Let qemu_strtosz*() optionally reject trailing crap

2017-02-17 Thread Eric Blake
On 02/14/2017 04:26 AM, Markus Armbruster wrote:
> Change the qemu_strtosz() & friends to return -EINVAL when @endptr is
> null and the conversion doesn't consume the string completely.
> Matches how qemu_strtol() & friends work.
> 
> Only test_qemu_strtosz_simple() passes a null @endptr.  No functional
> change there, because its conversion consumes the string.
> 
> Simplify callers that use @endptr only to fail when it doesn't point
> to '\0' to pass a null @endptr instead.
> 
> Cc: Dr. David Alan Gilbert 
> Cc: Eduardo Habkost  (maintainer:X86)
> Cc: Kevin Wolf  (supporter:Block layer core)
> Cc: Max Reitz  (supporter:Block layer core)
> Cc: qemu-block@nongnu.org (open list:Block layer core)
> Signed-off-by: Markus Armbruster 
> ---
>  hmp.c   |  6 ++
>  hw/misc/ivshmem.c   |  6 ++
>  qapi/opts-visitor.c |  5 ++---
>  qemu-img.c  |  7 +--
>  qemu-io-cmds.c  |  7 +--
>  target/i386/cpu.c   |  5 ++---
>  tests/test-cutils.c |  6 ++
>  util/cutils.c   | 14 +-
>  8 files changed, 25 insertions(+), 31 deletions(-)

Nice cleanup.

Reviewed-by: Eric Blake 


> +++ b/qemu-img.c
> @@ -370,14 +370,9 @@ static int add_old_style_options(const char *fmt, 
> QemuOpts *opts,
>  
>  static int64_t cvtnum(const char *s)
>  {

> +++ b/qemu-io-cmds.c
> @@ -137,14 +137,9 @@ static char **breakline(char *input, int *count)
>  
>  static int64_t cvtnum(const char *s)
>  {
> -char *end;

Why do we reimplement cvtnum() as copied static functions instead of
exporting it?  But that would be a separate cleanup (perhaps squashed
into 20/24, where you use cvtnum in qemu-img).

> @@ -217,7 +217,8 @@ static int64_t do_strtosz(const char *nptr, char **end,
>  errno = 0;
>  val = strtod(nptr, );
>  if (isnan(val) || endptr == nptr || errno != 0) {

Hmm - we explicitly reject "NaN", but not "infinity".  But when strtod()
accepts infinity, ...

> -goto fail;
> +retval = -EINVAL;
> +goto out;
>  }
>  fraction = modf(val, );

then modf() returns 0 with integral left at infinity...

>  if (fraction != 0) {
> @@ -232,17 +233,20 @@ static int64_t do_strtosz(const char *nptr, char **end,
>  assert(mul >= 0);
>  }
>  if (mul == 1 && mul_required) {
> -goto fail;
> +retval = -EINVAL;
> +goto out;
>  }
>  if ((val * mul >= INT64_MAX) || val < 0) {

...and the multiply exceeds INT64_MAX, so we still end up rejecting it
(with ERANGE instead of EINVAL).  Weird way but seems to work, and is
pre-existing, so not this patch's problem.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 20/24] qemu-img: Wrap cvtnum() around qemu_strtosz()

2017-02-17 Thread Eric Blake
On 02/14/2017 04:26 AM, Markus Armbruster wrote:
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: qemu-block@nongnu.org
> Signed-off-by: Markus Armbruster 
> ---
>  qemu-img.c | 58 +++---
>  1 file changed, 31 insertions(+), 27 deletions(-)


> @@ -3858,11 +3866,9 @@ static int img_dd_count(const char *arg,
>  struct DdIo *in, struct DdIo *out,
>  struct DdInfo *dd)
>  {
> -char *end;
> +dd->count = cvtnum(arg);

Hmm. cvtnum() accepts "1.5G", GNU dd does not. POSIX requires dd to
accept '1kx10k' to mean 10 mebibytes, and GNU dd accepts '10xM' as a
synonym, but cvtnum() does not accept all those corner cases.  POSIX
requires dd to treat '1b' as '512', while cvdnum() treats it as '1'.  I
sometimes wonder if our 'qemu-img dd' subcommand should be reusing the
numeric parsing that we use everywhere else, in spite of it meaning that
we are different than the POSIX quirks on what numbers are required to
be supported by dd.  But that's not the concern of this patch.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 0/7] iscsi: Add blockdev-add support

2017-02-17 Thread Jeff Cody
On Wed, Jan 25, 2017 at 12:42:01PM -0500, Jeff Cody wrote:
> This adds blockdev-add support to the iscsi block driver.
> 
> Picked this series up from Kevin.  I've tested it on my local iscsi setup.
> 
> There are only a few minor changes:
> 
> * In patch 2, fixed the segfault pointed out by Daniel
> * In patch 6, placed the ':' after the command header as now required
> * New patch 7, to fix some out of date documentation in the qapi schema
> 
> 
> Jeff Cody (1):
>   QAPI: Fix blockdev-add example documentation
> 
> Kevin Wolf (6):
>   iscsi: Split URL into individual options
>   iscsi: Handle -iscsi user/password in bdrv_parse_filename()
>   iscsi: Add initiator-name option
>   iscsi: Add header-digest option
>   iscsi: Add timeout option
>   iscsi: Add blockdev-add support
> 
>  block/iscsi.c| 349 
> +++
>  qapi/block-core.json |  92 +++---
>  2 files changed, 288 insertions(+), 153 deletions(-)
> 
> -- 
> 2.9.3
>

Fixed up the formatting issues pointed out by Fam & patchew, and applied to
my block branch: 

git://github.com/codyprime/qemu-kvm-jtc.git block

-Jeff



[Qemu-block] [PULL 02/23] virtio: Report real progress in VQ aio poll handler

2017-02-17 Thread Michael S. Tsirkin
From: Fam Zheng 

In virtio_queue_host_notifier_aio_poll, not all "!virtio_queue_empty()"
cases are making true progress.

Currently the offending one is virtio-scsi event queue, whose handler
does nothing if no event is pending. As a result aio_poll() will spin on
the "non-empty" VQ and take 100% host CPU.

Fix this by reporting actual progress from virtio queue aio handlers.

Reported-by: Ed Swierk 
Signed-off-by: Fam Zheng 
Tested-by: Ed Swierk 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio-blk.h  |  2 +-
 include/hw/virtio/virtio-scsi.h |  6 +++---
 include/hw/virtio/virtio.h  |  4 ++--
 hw/block/dataplane/virtio-blk.c |  4 ++--
 hw/block/virtio-blk.c   | 12 ++--
 hw/scsi/virtio-scsi-dataplane.c | 14 +++---
 hw/scsi/virtio-scsi.c   | 14 +++---
 hw/virtio/virtio.c  | 15 +--
 8 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 9734b4c..d3c8a6f 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -80,6 +80,6 @@ typedef struct MultiReqBuffer {
 bool is_write;
 } MultiReqBuffer;
 
-void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
+bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
 
 #endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 7375196..f536f77 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -126,9 +126,9 @@ void virtio_scsi_common_realize(DeviceState *dev, Error 
**errp,
 VirtIOHandleOutput cmd);
 
 void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp);
-void virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq);
-void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq);
-void virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq);
+bool virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq);
+bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq);
+bool virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq);
 void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req);
 void virtio_scsi_free_req(VirtIOSCSIReq *req);
 void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 525da24..0863a25 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -154,6 +154,7 @@ void virtio_error(VirtIODevice *vdev, const char *fmt, ...) 
GCC_FMT_ATTR(2, 3);
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name);
 
 typedef void (*VirtIOHandleOutput)(VirtIODevice *, VirtQueue *);
+typedef bool (*VirtIOHandleAIOOutput)(VirtIODevice *, VirtQueue *);
 
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 VirtIOHandleOutput handle_output);
@@ -284,8 +285,7 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
-void (*fn)(VirtIODevice *,
-   VirtQueue *));
+VirtIOHandleAIOOutput 
handle_output);
 VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
 VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
 
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d1f9f63..5556f0e 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -147,7 +147,7 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 g_free(s);
 }
 
-static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
+static bool virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
 VirtQueue *vq)
 {
 VirtIOBlock *s = (VirtIOBlock *)vdev;
@@ -155,7 +155,7 @@ static void 
virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
 assert(s->dataplane);
 assert(s->dataplane_started);
 
-virtio_blk_handle_vq(s, vq);
+return virtio_blk_handle_vq(s, vq);
 }
 
 /* Context: QEMU global mutex held */
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 702eda8..baaa195 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -581,10 +581,11 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 return 0;
 }
 
-void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
+bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 {
 VirtIOBlockReq *req;
 

Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name

2017-02-17 Thread Dr. David Alan Gilbert
* Fam Zheng (f...@redhat.com) wrote:
> On Fri, 02/17 16:36, Vladimir Sementsov-Ogievskiy wrote:
> > 17.02.2017 15:21, Fam Zheng wrote:
> > > On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote:
> > > > 16.02.2017 16:48, Fam Zheng wrote:
> > > > > On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > When testing migration, auto-generated by qemu node-names differs in
> > > > > > source and destination qemu and migration fails. After this patch,
> > > > > > auto-generated by iotest nodenames will be the same.
> > > > > What should be done in libvirt to make sure the node-names are 
> > > > > matching
> > > > > correctly at both sides?
> > > > Hmm, just set node names appropriately?
> > > But I think the problem is that node names are not configurable from 
> > > libvirt
> > > today, and then the migration will fail. Should the device name take 
> > > precedence
> > > in the code, to make it easier?
> > 
> > libvirt can use same parameters as I in this patch..
> 
> If I'm not mistaken, libvirt can be patched to explicitly set the same node
> names in the QEMU command line, but that is some extra work to do there. My
> point is if device names are used during migration, when available, this patch
> and the libvirt change is not necessary.

Always best to check with libvirt guys to see what makes sense for them;
ccing in jdenemar.

Dave

> Fam
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-block] [PATCH 1/3] curl: do not use aio_context_acquire/release

2017-02-17 Thread Paolo Bonzini
Now that all bottom halves and callbacks take care of taking the
AioContext lock, we can migrate some users away from it and to a
specific QemuMutex or CoMutex.

Protect BDRVCURLState access with a QemuMutex.

Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 2939cc7..e83dcd8 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -135,6 +135,7 @@ typedef struct BDRVCURLState {
 char *cookie;
 bool accept_range;
 AioContext *aio_context;
+QemuMutex mutex;
 char *username;
 char *password;
 char *proxyusername;
@@ -333,6 +334,7 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, 
size_t len,
 return FIND_RET_NONE;
 }
 
+/* Called with s->mutex held.  */
 static void curl_multi_check_completion(BDRVCURLState *s)
 {
 int msgs_in_queue;
@@ -374,7 +376,9 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 continue;
 }
 
+qemu_mutex_unlock(>mutex);
 acb->common.cb(acb->common.opaque, -EPROTO);
+qemu_mutex_lock(>mutex);
 qemu_aio_unref(acb);
 state->acb[i] = NULL;
 }
@@ -386,6 +390,7 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 }
 }
 
+/* Called with s->mutex held.  */
 static void curl_multi_do_locked(CURLState *s)
 {
 CURLSocket *socket, *next_socket;
@@ -409,19 +414,19 @@ static void curl_multi_do(void *arg)
 {
 CURLState *s = (CURLState *)arg;
 
-aio_context_acquire(s->s->aio_context);
+qemu_mutex_lock(>s->mutex);
 curl_multi_do_locked(s);
-aio_context_release(s->s->aio_context);
+qemu_mutex_unlock(>s->mutex);
 }
 
 static void curl_multi_read(void *arg)
 {
 CURLState *s = (CURLState *)arg;
 
-aio_context_acquire(s->s->aio_context);
+qemu_mutex_lock(>s->mutex);
 curl_multi_do_locked(s);
 curl_multi_check_completion(s->s);
-aio_context_release(s->s->aio_context);
+qemu_mutex_unlock(>s->mutex);
 }
 
 static void curl_multi_timeout_do(void *arg)
@@ -434,11 +439,11 @@ static void curl_multi_timeout_do(void *arg)
 return;
 }
 
-aio_context_acquire(s->aio_context);
+qemu_mutex_lock(>mutex);
 curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, );
 
 curl_multi_check_completion(s);
-aio_context_release(s->aio_context);
+qemu_mutex_unlock(>mutex);
 #else
 abort();
 #endif
@@ -771,6 +776,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 curl_easy_cleanup(state->curl);
 state->curl = NULL;
 
+qemu_mutex_init(>mutex);
 curl_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
 qemu_opts_del(opts);
@@ -801,12 +807,11 @@ static void curl_readv_bh_cb(void *p)
 CURLAIOCB *acb = p;
 BlockDriverState *bs = acb->common.bs;
 BDRVCURLState *s = bs->opaque;
-AioContext *ctx = bdrv_get_aio_context(bs);
 
 size_t start = acb->sector_num * BDRV_SECTOR_SIZE;
 size_t end;
 
-aio_context_acquire(ctx);
+qemu_mutex_lock(>mutex);
 
 // In case we have the requested data already (e.g. read-ahead),
 // we can just call the callback and be done.
@@ -854,7 +859,7 @@ static void curl_readv_bh_cb(void *p)
 curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, );
 
 out:
-aio_context_release(ctx);
+qemu_mutex_unlock(>mutex);
 if (ret != -EINPROGRESS) {
 acb->common.cb(acb->common.opaque, ret);
 qemu_aio_unref(acb);
@@ -883,6 +888,7 @@ static void curl_close(BlockDriverState *bs)
 
 DPRINTF("CURL: Close\n");
 curl_detach_aio_context(bs);
+qemu_mutex_destroy(>mutex);
 
 g_free(s->cookie);
 g_free(s->url);
-- 
2.9.3





[Qemu-block] [PATCH 3/3] iscsi: do not use aio_context_acquire/release

2017-02-17 Thread Paolo Bonzini
Now that all bottom halves and callbacks take care of taking the
AioContext lock, we can migrate some users away from it and to a
specific QemuMutex or CoMutex.

Protect libiscsi calls with a QemuMutex.  Callbacks are invoked
using bottom halves, so we don't even have to drop it around
callback invocations.

Signed-off-by: Paolo Bonzini 
---
 block/iscsi.c | 83 +--
 1 file changed, 64 insertions(+), 19 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 2561be9..e483f6d 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -58,6 +58,7 @@ typedef struct IscsiLun {
 int events;
 QEMUTimer *nop_timer;
 QEMUTimer *event_timer;
+QemuMutex mutex;
 struct scsi_inquiry_logical_block_provisioning lbp;
 struct scsi_inquiry_block_limits bl;
 unsigned char *zeroblock;
@@ -252,6 +253,7 @@ static int iscsi_translate_sense(struct scsi_sense *sense)
 return ret;
 }
 
+/* Called (via iscsi_service) with QemuMutex held.  */
 static void
 iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 void *command_data, void *opaque)
@@ -352,6 +354,7 @@ static const AIOCBInfo iscsi_aiocb_info = {
 static void iscsi_process_read(void *arg);
 static void iscsi_process_write(void *arg);
 
+/* Called with QemuMutex held.  */
 static void
 iscsi_set_events(IscsiLun *iscsilun)
 {
@@ -395,10 +398,10 @@ iscsi_process_read(void *arg)
 IscsiLun *iscsilun = arg;
 struct iscsi_context *iscsi = iscsilun->iscsi;
 
-aio_context_acquire(iscsilun->aio_context);
+qemu_mutex_lock(>mutex);
 iscsi_service(iscsi, POLLIN);
 iscsi_set_events(iscsilun);
-aio_context_release(iscsilun->aio_context);
+qemu_mutex_unlock(>mutex);
 }
 
 static void
@@ -407,10 +410,10 @@ iscsi_process_write(void *arg)
 IscsiLun *iscsilun = arg;
 struct iscsi_context *iscsi = iscsilun->iscsi;
 
-aio_context_acquire(iscsilun->aio_context);
+qemu_mutex_lock(>mutex);
 iscsi_service(iscsi, POLLOUT);
 iscsi_set_events(iscsilun);
-aio_context_release(iscsilun->aio_context);
+qemu_mutex_unlock(>mutex);
 }
 
 static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
@@ -589,6 +592,7 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 uint64_t lba;
 uint32_t num_sectors;
 bool fua = flags & BDRV_REQ_FUA;
+int r = 0;
 
 if (fua) {
 assert(iscsilun->dpofua);
@@ -604,6 +608,7 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 lba = sector_qemu2lun(sector_num, iscsilun);
 num_sectors = sector_qemu2lun(nb_sectors, iscsilun);
 iscsi_co_init_iscsitask(iscsilun, );
+qemu_mutex_lock(>mutex);
 retry:
 if (iscsilun->use_16_for_rw) {
 #if LIBISCSI_API_VERSION >= (20160603)
@@ -640,7 +645,9 @@ retry:
 #endif
 while (!iTask.complete) {
 iscsi_set_events(iscsilun);
+qemu_mutex_unlock(>mutex);
 qemu_coroutine_yield();
+qemu_mutex_lock(>mutex);
 }
 
 if (iTask.task != NULL) {
@@ -655,12 +662,15 @@ retry:
 
 if (iTask.status != SCSI_STATUS_GOOD) {
 iscsi_allocmap_set_invalid(iscsilun, sector_num, nb_sectors);
-return iTask.err_code;
+r = iTask.err_code;
+goto out_unlock;
 }
 
 iscsi_allocmap_set_allocated(iscsilun, sector_num, nb_sectors);
 
-return 0;
+out_unlock:
+qemu_mutex_unlock(>mutex);
+return r;
 }
 
 
@@ -693,18 +703,21 @@ static int64_t coroutine_fn 
iscsi_co_get_block_status(BlockDriverState *bs,
 goto out;
 }
 
+qemu_mutex_lock(>mutex);
 retry:
 if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
   sector_qemu2lun(sector_num, iscsilun),
   8 + 16, iscsi_co_generic_cb,
   ) == NULL) {
 ret = -ENOMEM;
-goto out;
+goto out_unlock;
 }
 
 while (!iTask.complete) {
 iscsi_set_events(iscsilun);
+qemu_mutex_unlock(>mutex);
 qemu_coroutine_yield();
+qemu_mutex_lock(>mutex);
 }
 
 if (iTask.do_retry) {
@@ -721,20 +734,20 @@ retry:
  * because the device is busy or the cmd is not
  * supported) we pretend all blocks are allocated
  * for backwards compatibility */
-goto out;
+goto out_unlock;
 }
 
 lbas = scsi_datain_unmarshall(iTask.task);
 if (lbas == NULL) {
 ret = -EIO;
-goto out;
+goto out_unlock;
 }
 
 lbasd = >descriptors[0];
 
 if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
 ret = -EIO;
-goto out;
+goto out_unlock;
 }
 
 *pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
@@ -756,6 +769,8 @@ retry:
 if (*pnum > nb_sectors) {
 *pnum = nb_sectors;
 }
+out_unlock:
+qemu_mutex_unlock(>mutex);
 out:
 if (iTask.task != NULL) {
 

[Qemu-block] [PATCH 0/3] do not use aio_context_acquire/release in AIO-based drivers

2017-02-17 Thread Paolo Bonzini
aio_context_acquire/release are only going away as soon as the block layer
becomes thread-safe, but we can already move away to other finer-grained
mutex whenever possible.

These three drivers don't use coroutines, hence a QemuMutex is a fine
primitive to use for protecting any per-BDS data in the libraries
they use.  The QemuMutex must protect any fd handlers or bottom halves,
and also the BlockDriver callbacks which were implicitly being called
under aio_context_acquire.

Paolo

Paolo Bonzini (3):
  curl: do not use aio_context_acquire/release
  nfs: do not use aio_context_acquire/release
  iscsi: do not use aio_context_acquire/release

 block/curl.c  | 24 ++---
 block/iscsi.c | 83 +--
 block/nfs.c   | 20 +++---
 3 files changed, 95 insertions(+), 32 deletions(-)

-- 
2.9.3




[Qemu-block] [PATCH 2/3] nfs: do not use aio_context_acquire/release

2017-02-17 Thread Paolo Bonzini
Now that all bottom halves and callbacks take care of taking the
AioContext lock, we can migrate some users away from it and to a
specific QemuMutex or CoMutex.

Protect libnfs calls with a QemuMutex.  Callbacks are invoked
using bottom halves, so we don't even have to drop it around
callback invocations.

Signed-off-by: Paolo Bonzini 
---
 block/nfs.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 08b43dd..4eddcee 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -54,6 +54,7 @@ typedef struct NFSClient {
 int events;
 bool has_zero_init;
 AioContext *aio_context;
+QemuMutex mutex;
 blkcnt_t st_blocks;
 bool cache_used;
 NFSServer *server;
@@ -191,6 +192,7 @@ static void nfs_parse_filename(const char *filename, QDict 
*options,
 static void nfs_process_read(void *arg);
 static void nfs_process_write(void *arg);
 
+/* Called with QemuMutex held.  */
 static void nfs_set_events(NFSClient *client)
 {
 int ev = nfs_which_events(client->context);
@@ -209,20 +211,20 @@ static void nfs_process_read(void *arg)
 {
 NFSClient *client = arg;
 
-aio_context_acquire(client->aio_context);
+qemu_mutex_lock(>mutex);
 nfs_service(client->context, POLLIN);
 nfs_set_events(client);
-aio_context_release(client->aio_context);
+qemu_mutex_unlock(>mutex);
 }
 
 static void nfs_process_write(void *arg)
 {
 NFSClient *client = arg;
 
-aio_context_acquire(client->aio_context);
+qemu_mutex_lock(>mutex);
 nfs_service(client->context, POLLOUT);
 nfs_set_events(client);
-aio_context_release(client->aio_context);
+qemu_mutex_unlock(>mutex);
 }
 
 static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
@@ -242,6 +244,7 @@ static void nfs_co_generic_bh_cb(void *opaque)
 aio_co_wake(task->co);
 }
 
+/* Called (via nfs_service) with QemuMutex held.  */
 static void
 nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
   void *private_data)
@@ -273,6 +276,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
 nfs_co_init_task(bs, );
 task.iov = iov;
 
+qemu_mutex_lock(>mutex);
 if (nfs_pread_async(client->context, client->fh,
 sector_num * BDRV_SECTOR_SIZE,
 nb_sectors * BDRV_SECTOR_SIZE,
@@ -281,6 +285,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
 }
 
 nfs_set_events(client);
+qemu_mutex_unlock(>mutex);
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -314,6 +319,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
 
 qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
 
+qemu_mutex_lock(>mutex);
 if (nfs_pwrite_async(client->context, client->fh,
  sector_num * BDRV_SECTOR_SIZE,
  nb_sectors * BDRV_SECTOR_SIZE,
@@ -323,6 +329,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
 }
 
 nfs_set_events(client);
+qemu_mutex_unlock(>mutex);
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -343,12 +350,14 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 
 nfs_co_init_task(bs, );
 
+qemu_mutex_lock(>mutex);
 if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
 ) != 0) {
 return -ENOMEM;
 }
 
 nfs_set_events(client);
+qemu_mutex_unlock(>mutex);
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -434,6 +443,7 @@ static void nfs_file_close(BlockDriverState *bs)
 {
 NFSClient *client = bs->opaque;
 nfs_client_close(client);
+qemu_mutex_destroy(>mutex);
 }
 
 static NFSServer *nfs_config(QDict *options, Error **errp)
@@ -641,6 +651,7 @@ static int nfs_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (ret < 0) {
 return ret;
 }
+qemu_mutex_init(>mutex);
 bs->total_sectors = ret;
 ret = 0;
 return ret;
@@ -696,6 +707,7 @@ static int nfs_has_zero_init(BlockDriverState *bs)
 return client->has_zero_init;
 }
 
+/* Called (via nfs_service) with QemuMutex held.  */
 static void
 nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
void *private_data)
-- 
2.9.3





Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Denis V. Lunev
On 02/17/2017 04:34 PM, Kevin Wolf wrote:
> Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
>> On 02/17/2017 03:48 PM, Kevin Wolf wrote:
>>> Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
 17.02.2017 15:09, Kevin Wolf wrote:
> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 16.02.2017 14:49, Kevin Wolf wrote:
>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
 Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Auto loading bitmaps are bitmaps stored in the disk image, which 
> should
> be loaded when the image is opened and become BdrvDirtyBitmaps for the
> corresponding drive.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: John Snow 
> Reviewed-by: Max Reitz 
 Why do we need a new BlockDriver callback and special code for it in
 bdrv_open_common()? The callback is only ever called immediately after
 .bdrv_open/.bdrv_file_open, so can't the drivers just do this 
 internally
 in their .bdrv_open implementation? Even more so because qcow2 is the
 only driver that supports this callback.
>>> Actually, don't we have to call this in qcow2_invalidate_cache()?
>>> Currently, I think, after a migration, the autoload bitmaps aren't
>>> loaded.
>>>
>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
>>> qcow2_open(), this would be fixed.
>>>
>>> Kevin
>> Bitmap should not be reloaded on any intermediate qcow2-open's,
>> reopens, etc. It should be loaded once, on bdrv_open, to not create
>> extra collisions (between in-memory bitmap and it's stored version).
>> That was the idea.
>>
>> For bitmaps migration there are separate series, we shouldn't load
>> bitmap from file on migration, as it's version in the file is
>> outdated.
> That's not what your series is doing, though. It loads the bitmaps when
 Actually, they will not be loaded as they will have IN_USE flag.

> migration starts and doesn't reload then when migration completes, even
> though they are stale. Migration with shared storage would just work
> without an extra series if you did these things in the correct places.
>
> As a reminder, this is how migration with shared storage works (or
> should work with your series):
>
> 1. Start destination qemu instance. This calls bdrv_open() with
>BDRV_O_INACTIVE. We can read in some metadata, though we don't need
>much more than the image size at this point. Writing to the image is
>still impossible.
>
> 2. Start migration on the source, while the VM is still writing to the
>image, rendering the cached metadata from step 1 stale.
>
> 3. Migration completes:
>
> a. Stop the VM
>
> b. Inactivate all images in the source qemu. This is where all
>metadata needs to be written back to the image file, including
>bitmaps. No writes to the image are possible after this point
>because BDRV_O_INACTIVE is set.
>
> c. Invalidate the caches in the destination qemu, i.e. reload
>everything from the file that could have changed since step 1,
>including bitmaps. BDRV_O_INACTIVE is cleared, making the image
>ready for writes.
>
> d. Resume the VM on the destination
>
> 4. Exit the source qemu process, which involves bdrv_close(). Note that
>at this point, no writing to the image file is possible any more,
>it's the destination qemu process that own the image file now.
>
> Your series loads and stores bitmaps in steps 1 and 4. This means that
 Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
 it is checked), nothing is stored.

> they are stale on the destination when migration completes, and that
> bdrv_close() wants to write to an image file that it doesn't own any
> more, which will cause an assertion failure. If you instead move things
> to steps 3b and 3c, it will just work.
 Hmm, I understand the idea.. But this will interfere with postcopy
 bitmap migration. So if we really need this, there should be some
 additional control flags or capabilities.. The problem of your
 approach is that bitmap actually migrated in the short state when
 source and destination are stopped, it may take time, as bitmaps may
 be large.
>>> You can always add optimisations, but this is the basic lifecycle
>>> process of block devices in qemu, so it would be good to adhere to it.
>>> So far there are no other pieces of information that are ignored in
>>> bdrv_invalidate()/bdrv_inactivate() and instead only handled in
>>> bdrv_open()/bdrv_close(). It's a matter of consistency, 

[Qemu-block] [PATCH 0/2] block/nfs optimizations

2017-02-17 Thread Peter Lieven
Peter Lieven (2):
  block/nfs: convert to preadv / pwritev
  block/nfs: try to avoid the bounce buffer in pwritev

 block/nfs.c | 50 --
 1 file changed, 28 insertions(+), 22 deletions(-)

-- 
1.9.1




[Qemu-block] [PATCH 2/2] block/nfs: try to avoid the bounce buffer in pwritev

2017-02-17 Thread Peter Lieven
if the passed qiov contains exactly one iov we can
pass the buffer directly.

Signed-off-by: Peter Lieven 
---
 block/nfs.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index ab5dcc2..bb4b75f 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -295,20 +295,27 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 NFSClient *client = bs->opaque;
 NFSRPC task;
 char *buf = NULL;
+bool my_buffer = false;
 
 nfs_co_init_task(bs, );
 
-buf = g_try_malloc(bytes);
-if (bytes && buf == NULL) {
-return -ENOMEM;
+if (iov->niov != 1) {
+buf = g_try_malloc(bytes);
+if (bytes && buf == NULL) {
+return -ENOMEM;
+}
+qemu_iovec_to_buf(iov, 0, buf, bytes);
+my_buffer = true;
+} else {
+buf = iov->iov[0].iov_base;
 }
 
-qemu_iovec_to_buf(iov, 0, buf, bytes);
-
 if (nfs_pwrite_async(client->context, client->fh,
  offset, bytes, buf,
  nfs_co_generic_cb, ) != 0) {
-g_free(buf);
+if (my_buffer) {
+g_free(buf);
+}
 return -ENOMEM;
 }
 
@@ -317,7 +324,9 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 qemu_coroutine_yield();
 }
 
-g_free(buf);
+if (my_buffer) {
+g_free(buf);
+}
 
 if (task.ret != bytes) {
 return task.ret < 0 ? task.ret : -EIO;
-- 
1.9.1




[Qemu-block] [PATCH 1/2] block/nfs: convert to preadv / pwritev

2017-02-17 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 block/nfs.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 689eaa7..ab5dcc2 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -256,9 +256,9 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void 
*data,
 nfs_co_generic_bh_cb, task);
 }
 
-static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
- int64_t sector_num, int nb_sectors,
- QEMUIOVector *iov)
+static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, QEMUIOVector *iov,
+  int flags)
 {
 NFSClient *client = bs->opaque;
 NFSRPC task;
@@ -267,9 +267,7 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
 task.iov = iov;
 
 if (nfs_pread_async(client->context, client->fh,
-sector_num * BDRV_SECTOR_SIZE,
-nb_sectors * BDRV_SECTOR_SIZE,
-nfs_co_generic_cb, ) != 0) {
+offset, bytes, nfs_co_generic_cb, ) != 0) {
 return -ENOMEM;
 }
 
@@ -290,9 +288,9 @@ static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
 return 0;
 }
 
-static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-QEMUIOVector *iov)
+static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *iov,
+   int flags)
 {
 NFSClient *client = bs->opaque;
 NFSRPC task;
@@ -300,17 +298,16 @@ static int coroutine_fn nfs_co_writev(BlockDriverState 
*bs,
 
 nfs_co_init_task(bs, );
 
-buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE);
-if (nb_sectors && buf == NULL) {
+buf = g_try_malloc(bytes);
+if (bytes && buf == NULL) {
 return -ENOMEM;
 }
 
-qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
+qemu_iovec_to_buf(iov, 0, buf, bytes);
 
 if (nfs_pwrite_async(client->context, client->fh,
- sector_num * BDRV_SECTOR_SIZE,
- nb_sectors * BDRV_SECTOR_SIZE,
- buf, nfs_co_generic_cb, ) != 0) {
+ offset, bytes, buf,
+ nfs_co_generic_cb, ) != 0) {
 g_free(buf);
 return -ENOMEM;
 }
@@ -322,7 +319,7 @@ static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
 
 g_free(buf);
 
-if (task.ret != nb_sectors * BDRV_SECTOR_SIZE) {
+if (task.ret != bytes) {
 return task.ret < 0 ? task.ret : -EIO;
 }
 
@@ -856,8 +853,8 @@ static BlockDriver bdrv_nfs = {
 .bdrv_create= nfs_file_create,
 .bdrv_reopen_prepare= nfs_reopen_prepare,
 
-.bdrv_co_readv  = nfs_co_readv,
-.bdrv_co_writev = nfs_co_writev,
+.bdrv_co_preadv = nfs_co_preadv,
+.bdrv_co_pwritev= nfs_co_pwritev,
 .bdrv_co_flush_to_disk  = nfs_co_flush,
 
 .bdrv_detach_aio_context= nfs_detach_aio_context,
-- 
1.9.1




[Qemu-block] [RFC PATCH V4] qemu-img: make convert async

2017-02-17 Thread Peter Lieven
this is something I have been thinking about for almost 2 years now.
we heavily have the following two use cases when using qemu-img convert.

a) reading from NFS and writing to iSCSI for deploying templates
b) reading from iSCSI and writing to NFS for backups

In both processes we use libiscsi and libnfs so we have no kernel pagecache.
As qemu-img convert is implemented with sync operations that means we
read one buffer and then write it. No parallelism and each sync request
takes as long as it takes until it is completed.

This is version 4 of the approach using coroutine worker "threads".

So far I have the following runtimes when reading an uncompressed QCOW2 from
NFS and writing it to iSCSI (raw):

qemu-img (master)
 nfs -> iscsi 22.8 secs
 nfs -> ram   11.7 secs
 ram -> iscsi 12.3 secs

qemu-img-async (8 coroutines, in-order write disabled)
 nfs -> iscsi 11.0 secs
 nfs -> ram   10.4 secs
 ram -> iscsi  9.0 secs

The following are the runtimes found with different settings between V3 and V4.
This is always the best runtime out of 10 runs when converting from nfs to 
iscsi.
Please note that in V4 in-order write scenarios show a very high jitter. I think
this is because the get_block_status on the NFS share is delayed by concurrent 
read
requests.

   in-orderout-of-order
V3  - 16 coroutines12.4 seconds11.1 seconds
-  8 coroutines12.2 seconds11.3 seconds
-  4 coroutines12.5 seconds11.1 seconds
-  2 coroutines14.8 seconds14.9 seconds

V4  - 32 coroutines15.9 seconds11.5 seconds
- 16 coroutines12.5 seconds11.0 seconds
-  8 coroutines12.9 seconds11.0 seconds
-  4 coroutines14.1 seconds11.5 seconds
-  2 coroutines16.9 seconds13.2 seconds

Comments appreciated.

Thank you,
Peter

Signed-off-by: Peter Lieven 
---
v3->v4: - avoid to prepare a request queue upfront [Kevin]
- do not ignore the BLK_BACKING_FILE status [Kevin]
- redesign the interface to the read and write routines [Kevin]

v2->v3: - updated stats in the commit msg from a host with a better network card
- only wake up the coroutine that is acutally waiting for a write to 
complete.
  this was not only overhead, but also breaking at least linux AIO.
- fix coding style complaints
- rename some variables and structs

v1->v2: - using coroutine as worker "threads". [Max]
- keeping the request queue as otherwise it happens
  that we wait on BLK_ZERO chunks while keeping the write order.
  it also avoids redundant calls to get_block_status and helps
  to skip some conditions for fully allocated imaged (!s->min_sparse)
---
 qemu-img.c | 260 -
 1 file changed, 187 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index cff22e3..6bac980 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1448,6 +1448,8 @@ enum ImgConvertBlockStatus {
 BLK_BACKING_FILE,
 };
 
+#define MAX_COROUTINES 16
+
 typedef struct ImgConvertState {
 BlockBackend **src;
 int64_t *src_sectors;
@@ -1455,15 +1457,25 @@ typedef struct ImgConvertState {
 int64_t src_cur_offset;
 int64_t total_sectors;
 int64_t allocated_sectors;
+int64_t allocated_done;
+int64_t sector_num;
+int64_t wr_offs;
 enum ImgConvertBlockStatus status;
 int64_t sector_next_status;
 BlockBackend *target;
 bool has_zero_init;
 bool compressed;
 bool target_has_backing;
+bool wr_in_order;
 int min_sparse;
 size_t cluster_sectors;
 size_t buf_sectors;
+int num_coroutines;
+int running_coroutines;
+Coroutine *co[MAX_COROUTINES];
+int64_t wait_sector_num[MAX_COROUTINES];
+CoMutex lock;
+int ret;
 } ImgConvertState;
 
 static void convert_select_part(ImgConvertState *s, int64_t sector_num)
@@ -1544,11 +1556,12 @@ static int convert_iteration_sectors(ImgConvertState 
*s, int64_t sector_num)
 return n;
 }
 
-static int convert_read(ImgConvertState *s, int64_t sector_num, int nb_sectors,
-uint8_t *buf)
+static int convert_co_read(ImgConvertState *s, int64_t sector_num,
+   int nb_sectors, uint8_t *buf)
 {
-int n;
-int ret;
+int n, ret;
+QEMUIOVector qiov;
+struct iovec iov;
 
 assert(nb_sectors <= s->buf_sectors);
 while (nb_sectors > 0) {
@@ -1563,9 +1576,13 @@ static int convert_read(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
 bs_sectors = s->src_sectors[s->src_cur];
 
 n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset));
-ret = blk_pread(blk,
-(sector_num - s->src_cur_offset) << BDRV_SECTOR_BITS,
-buf, n << BDRV_SECTOR_BITS);
+iov.iov_base = buf;
+iov.iov_len = n << BDRV_SECTOR_BITS;
+qemu_iovec_init_external(, , 1);
+
+ret = 

Re: [Qemu-block] [PATCH v15 25/25] qcow2-bitmap: improve check_constraints_on_bitmap

2017-02-17 Thread Eric Blake
On 02/17/2017 04:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> 16.02.2017 17:21, Kevin Wolf wrote:
>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Add detailed error messages.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Why not merge this patch into the one that originally introduced the
>> function?
> 
> Just to not create extra work for reviewers

It's extra work for reviewers if you don't rebase obvious fixes where
they belong - a new reviewer may flag the issue in the earlier patch
only to find out later in the series that you've already fixed it.
Avoiding needless code churn is part of what rebasing is all about - you
want each step of the series to be self-contained and as correct as
possible, by adding in the fixes at the point where they make sense,
rather than at the end of the series.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

17.02.2017 17:24, Kevin Wolf wrote:

Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:

On 02/17/2017 04:34 PM, Kevin Wolf wrote:

Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:

But for sure this is bad from the downtime point of view.
On migrate you will have to write to the image and re-read
it again on the target. This would be very slow. This will
not help for the migration with non-shared disk too.

That is why we have specifically worked in a migration,
which for a good does not influence downtime at all now.

With a write we are issuing several write requests + sync.
Our measurements shows that bdrv_drain could take around
a second on an averagely loaded conventional system, which
seems unacceptable addition to me.

I'm not arguing against optimising migration, I fully agree with you. I
just think that we should start with a correct if slow base version and
then add optimisation to that, instead of starting with a broken base
version and adding to that.

Look, whether you do the expensive I/O on open/close and make that a
slow operation or whether you do it on invalidate_cache/inactivate
doesn't really make a difference in term of slowness because in general
both operations are called exactly once. But it does make a difference
in terms of correctness.

Once you do the optimisation, of course, you'll skip writing those
bitmaps that you transfer using a different channel, no matter whether
you skip it in bdrv_close() or in bdrv_inactivate().

Kevin

I do not understand this point as in order to optimize this
we will have to create specific code path or option from
the migration code and keep this as an ugly kludge forever.

The point that I don't understand is why it makes any difference for the
follow-up migration series whether the writeout is in bdrv_close() or
bdrv_inactivate(). I don't really see the difference between the two
from a migration POV; both need to be skipped if we transfer the bitmap
using a different channel.

Maybe I would see the reason if I could find the time to look at the
migration patches first, but unfortunately I don't have this time at the
moment.

My point is just that generally we want to have a correctly working qemu
after every single patch, and even more importantly after every series.
As the migration series is separate from this, I don't think it's a good
excuse for doing worse than we could easily do here.

Kevin


With open/close all is already ok - bitmaps will not be saved because of 
BDRV_O_INACTIVE  and will not be loaded because of IN_USE.



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH 1/3] qemu-img: Add tests for raw image preallocation

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 15:20 hat Nir Soffer geschrieben:
> On Fri, Feb 17, 2017 at 11:14 AM, Kevin Wolf  wrote:
> > Am 17.02.2017 um 01:51 hat Nir Soffer geschrieben:
> >> Add tests for creating raw image with and without the preallocation
> >> option.
> >>
> >> Signed-off-by: Nir Soffer 
> >
> > Looks good, but 175 is already (multiply) taken. Not making this a
> > blocker, but I just want to remind everyone to check the mailing list
> > for pending patches which add new tests before using a new number in
> > order to avoid unnecessary rebases for everyone. In general, it's as
> > easy as searching for the string "175.out" in the mailbox.
> >
> > The next free one seems to be 177 currently.
> 
> Thanks, will change to 177 in the next version.

If there needs to be a next version for other reasons. Otherwise it's
not important enough to respin, it just means that someone else will
have to rebase.

> For next patches, what do you mean by "pending"? patches sent
> to the block mailing list?

Yes, that's what I'm looking for when I add a new test case myself. It's
just an easy way to avoid stepping on each others toes.

Kevin



Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 14:48 hat Denis V. Lunev geschrieben:
> On 02/17/2017 04:34 PM, Kevin Wolf wrote:
> > Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> >> But for sure this is bad from the downtime point of view.
> >> On migrate you will have to write to the image and re-read
> >> it again on the target. This would be very slow. This will
> >> not help for the migration with non-shared disk too.
> >>
> >> That is why we have specifically worked in a migration,
> >> which for a good does not influence downtime at all now.
> >>
> >> With a write we are issuing several write requests + sync.
> >> Our measurements shows that bdrv_drain could take around
> >> a second on an averagely loaded conventional system, which
> >> seems unacceptable addition to me.
> > I'm not arguing against optimising migration, I fully agree with you. I
> > just think that we should start with a correct if slow base version and
> > then add optimisation to that, instead of starting with a broken base
> > version and adding to that.
> >
> > Look, whether you do the expensive I/O on open/close and make that a
> > slow operation or whether you do it on invalidate_cache/inactivate
> > doesn't really make a difference in term of slowness because in general
> > both operations are called exactly once. But it does make a difference
> > in terms of correctness.
> >
> > Once you do the optimisation, of course, you'll skip writing those
> > bitmaps that you transfer using a different channel, no matter whether
> > you skip it in bdrv_close() or in bdrv_inactivate().
> >
> > Kevin
> I do not understand this point as in order to optimize this
> we will have to create specific code path or option from
> the migration code and keep this as an ugly kludge forever.

The point that I don't understand is why it makes any difference for the
follow-up migration series whether the writeout is in bdrv_close() or
bdrv_inactivate(). I don't really see the difference between the two
from a migration POV; both need to be skipped if we transfer the bitmap
using a different channel.

Maybe I would see the reason if I could find the time to look at the
migration patches first, but unfortunately I don't have this time at the
moment.

My point is just that generally we want to have a correctly working qemu
after every single patch, and even more importantly after every series.
As the migration series is separate from this, I don't think it's a good
excuse for doing worse than we could easily do here.

Kevin



Re: [Qemu-block] [PATCH 1/3] qemu-img: Add tests for raw image preallocation

2017-02-17 Thread Nir Soffer
On Fri, Feb 17, 2017 at 11:14 AM, Kevin Wolf  wrote:
> Am 17.02.2017 um 01:51 hat Nir Soffer geschrieben:
>> Add tests for creating raw image with and without the preallocation
>> option.
>>
>> Signed-off-by: Nir Soffer 
>
> Looks good, but 175 is already (multiply) taken. Not making this a
> blocker, but I just want to remind everyone to check the mailing list
> for pending patches which add new tests before using a new number in
> order to avoid unnecessary rebases for everyone. In general, it's as
> easy as searching for the string "175.out" in the mailbox.
>
> The next free one seems to be 177 currently.

Thanks, will change to 177 in the next version.

For next patches, what do you mean by "pending"? patches sent
to the block mailing list?

Nir



Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/7] iscsi: Handle -iscsi user/password in bdrv_parse_filename()

2017-02-17 Thread Fam Zheng
On Wed, 01/25 12:42, Jeff Cody wrote:
> From: Kevin Wolf 
> 
> This splits the logic in the old parse_chap() function into a part that
> parses the -iscsi options into the new driver-specific options, and
> another part that actually applies those options (called apply_chap()
> now).
> 
> Note that this means that username and password specified with -iscsi
> only take effect when a URL is provided. This is intentional, -iscsi is
> a legacy interface only supported for compatibility, new users should
> use the proper driver-specific options.
> 
> Signed-off-by: Kevin Wolf 
> Signed-off-by: Jeff Cody 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/7] iscsi: Handle -iscsi user/password in bdrv_parse_filename()

2017-02-17 Thread Fam Zheng
On Fri, 02/17 14:26, Kevin Wolf wrote:
> It is the one that it put into the QDict by iscsi_parse_iscsi_option(),
> which is supposed to be the value from -iscsi.

OK! This is what I was missing. :)

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name

2017-02-17 Thread Fam Zheng
On Fri, 02/17 16:36, Vladimir Sementsov-Ogievskiy wrote:
> 17.02.2017 15:21, Fam Zheng wrote:
> > On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote:
> > > 16.02.2017 16:48, Fam Zheng wrote:
> > > > On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote:
> > > > > When testing migration, auto-generated by qemu node-names differs in
> > > > > source and destination qemu and migration fails. After this patch,
> > > > > auto-generated by iotest nodenames will be the same.
> > > > What should be done in libvirt to make sure the node-names are matching
> > > > correctly at both sides?
> > > Hmm, just set node names appropriately?
> > But I think the problem is that node names are not configurable from libvirt
> > today, and then the migration will fail. Should the device name take 
> > precedence
> > in the code, to make it easier?
> 
> libvirt can use same parameters as I in this patch..

If I'm not mistaken, libvirt can be patched to explicitly set the same node
names in the QEMU command line, but that is some extra work to do there. My
point is if device names are used during migration, when available, this patch
and the libvirt change is not necessary.

Fam



Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Denis V. Lunev
On 02/17/2017 03:48 PM, Kevin Wolf wrote:
> Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 17.02.2017 15:09, Kevin Wolf wrote:
>>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
 16.02.2017 14:49, Kevin Wolf wrote:
> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Auto loading bitmaps are bitmaps stored in the disk image, which should
>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
>>> corresponding drive.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: John Snow 
>>> Reviewed-by: Max Reitz 
>> Why do we need a new BlockDriver callback and special code for it in
>> bdrv_open_common()? The callback is only ever called immediately after
>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
>> in their .bdrv_open implementation? Even more so because qcow2 is the
>> only driver that supports this callback.
> Actually, don't we have to call this in qcow2_invalidate_cache()?
> Currently, I think, after a migration, the autoload bitmaps aren't
> loaded.
>
> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> qcow2_open(), this would be fixed.
>
> Kevin
 Bitmap should not be reloaded on any intermediate qcow2-open's,
 reopens, etc. It should be loaded once, on bdrv_open, to not create
 extra collisions (between in-memory bitmap and it's stored version).
 That was the idea.

 For bitmaps migration there are separate series, we shouldn't load
 bitmap from file on migration, as it's version in the file is
 outdated.
>>> That's not what your series is doing, though. It loads the bitmaps when
>> Actually, they will not be loaded as they will have IN_USE flag.
>>
>>> migration starts and doesn't reload then when migration completes, even
>>> though they are stale. Migration with shared storage would just work
>>> without an extra series if you did these things in the correct places.
>>>
>>> As a reminder, this is how migration with shared storage works (or
>>> should work with your series):
>>>
>>> 1. Start destination qemu instance. This calls bdrv_open() with
>>>BDRV_O_INACTIVE. We can read in some metadata, though we don't need
>>>much more than the image size at this point. Writing to the image is
>>>still impossible.
>>>
>>> 2. Start migration on the source, while the VM is still writing to the
>>>image, rendering the cached metadata from step 1 stale.
>>>
>>> 3. Migration completes:
>>>
>>> a. Stop the VM
>>>
>>> b. Inactivate all images in the source qemu. This is where all
>>>metadata needs to be written back to the image file, including
>>>bitmaps. No writes to the image are possible after this point
>>>because BDRV_O_INACTIVE is set.
>>>
>>> c. Invalidate the caches in the destination qemu, i.e. reload
>>>everything from the file that could have changed since step 1,
>>>including bitmaps. BDRV_O_INACTIVE is cleared, making the image
>>>ready for writes.
>>>
>>> d. Resume the VM on the destination
>>>
>>> 4. Exit the source qemu process, which involves bdrv_close(). Note that
>>>at this point, no writing to the image file is possible any more,
>>>it's the destination qemu process that own the image file now.
>>>
>>> Your series loads and stores bitmaps in steps 1 and 4. This means that
>> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
>> it is checked), nothing is stored.
>>
>>> they are stale on the destination when migration completes, and that
>>> bdrv_close() wants to write to an image file that it doesn't own any
>>> more, which will cause an assertion failure. If you instead move things
>>> to steps 3b and 3c, it will just work.
>> Hmm, I understand the idea.. But this will interfere with postcopy
>> bitmap migration. So if we really need this, there should be some
>> additional control flags or capabilities.. The problem of your
>> approach is that bitmap actually migrated in the short state when
>> source and destination are stopped, it may take time, as bitmaps may
>> be large.
> You can always add optimisations, but this is the basic lifecycle
> process of block devices in qemu, so it would be good to adhere to it.
> So far there are no other pieces of information that are ignored in
> bdrv_invalidate()/bdrv_inactivate() and instead only handled in
> bdrv_open()/bdrv_close(). It's a matter of consistency, too.
>
> And not having to add special cases for specific features in the generic
> bdrv_open()/close() paths is a big plus for me anyway.
>
> Kevin
But for sure this is bad from the downtime point of view.
On migrate you will have to write to the image and re-read
it again on the target. This would be 

Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

17.02.2017 15:21, Fam Zheng wrote:

On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote:

16.02.2017 16:48, Fam Zheng wrote:

On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote:

When testing migration, auto-generated by qemu node-names differs in
source and destination qemu and migration fails. After this patch,
auto-generated by iotest nodenames will be the same.

What should be done in libvirt to make sure the node-names are matching
correctly at both sides?

Hmm, just set node names appropriately?

But I think the problem is that node names are not configurable from libvirt
today, and then the migration will fail. Should the device name take precedence
in the code, to make it easier?


Why not configurable? libvirt can use same parameters as I in this 
patch.. Or what do you mean?




Fam



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> On 02/17/2017 03:48 PM, Kevin Wolf wrote:
> > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 17.02.2017 15:09, Kevin Wolf wrote:
> >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>  16.02.2017 14:49, Kevin Wolf wrote:
> > Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> >> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> Auto loading bitmaps are bitmaps stored in the disk image, which 
> >>> should
> >>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>> corresponding drive.
> >>>
> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >>> Reviewed-by: John Snow 
> >>> Reviewed-by: Max Reitz 
> >> Why do we need a new BlockDriver callback and special code for it in
> >> bdrv_open_common()? The callback is only ever called immediately after
> >> .bdrv_open/.bdrv_file_open, so can't the drivers just do this 
> >> internally
> >> in their .bdrv_open implementation? Even more so because qcow2 is the
> >> only driver that supports this callback.
> > Actually, don't we have to call this in qcow2_invalidate_cache()?
> > Currently, I think, after a migration, the autoload bitmaps aren't
> > loaded.
> >
> > By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> > qcow2_open(), this would be fixed.
> >
> > Kevin
>  Bitmap should not be reloaded on any intermediate qcow2-open's,
>  reopens, etc. It should be loaded once, on bdrv_open, to not create
>  extra collisions (between in-memory bitmap and it's stored version).
>  That was the idea.
> 
>  For bitmaps migration there are separate series, we shouldn't load
>  bitmap from file on migration, as it's version in the file is
>  outdated.
> >>> That's not what your series is doing, though. It loads the bitmaps when
> >> Actually, they will not be loaded as they will have IN_USE flag.
> >>
> >>> migration starts and doesn't reload then when migration completes, even
> >>> though they are stale. Migration with shared storage would just work
> >>> without an extra series if you did these things in the correct places.
> >>>
> >>> As a reminder, this is how migration with shared storage works (or
> >>> should work with your series):
> >>>
> >>> 1. Start destination qemu instance. This calls bdrv_open() with
> >>>BDRV_O_INACTIVE. We can read in some metadata, though we don't need
> >>>much more than the image size at this point. Writing to the image is
> >>>still impossible.
> >>>
> >>> 2. Start migration on the source, while the VM is still writing to the
> >>>image, rendering the cached metadata from step 1 stale.
> >>>
> >>> 3. Migration completes:
> >>>
> >>> a. Stop the VM
> >>>
> >>> b. Inactivate all images in the source qemu. This is where all
> >>>metadata needs to be written back to the image file, including
> >>>bitmaps. No writes to the image are possible after this point
> >>>because BDRV_O_INACTIVE is set.
> >>>
> >>> c. Invalidate the caches in the destination qemu, i.e. reload
> >>>everything from the file that could have changed since step 1,
> >>>including bitmaps. BDRV_O_INACTIVE is cleared, making the image
> >>>ready for writes.
> >>>
> >>> d. Resume the VM on the destination
> >>>
> >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that
> >>>at this point, no writing to the image file is possible any more,
> >>>it's the destination qemu process that own the image file now.
> >>>
> >>> Your series loads and stores bitmaps in steps 1 and 4. This means that
> >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
> >> it is checked), nothing is stored.
> >>
> >>> they are stale on the destination when migration completes, and that
> >>> bdrv_close() wants to write to an image file that it doesn't own any
> >>> more, which will cause an assertion failure. If you instead move things
> >>> to steps 3b and 3c, it will just work.
> >> Hmm, I understand the idea.. But this will interfere with postcopy
> >> bitmap migration. So if we really need this, there should be some
> >> additional control flags or capabilities.. The problem of your
> >> approach is that bitmap actually migrated in the short state when
> >> source and destination are stopped, it may take time, as bitmaps may
> >> be large.
> > You can always add optimisations, but this is the basic lifecycle
> > process of block devices in qemu, so it would be good to adhere to it.
> > So far there are no other pieces of information that are ignored in
> > bdrv_invalidate()/bdrv_inactivate() and instead only handled in
> > bdrv_open()/bdrv_close(). It's a matter of consistency, too.
> >
> > And not having to add special 

Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/7] iscsi: Handle -iscsi user/password in bdrv_parse_filename()

2017-02-17 Thread Kevin Wolf
Am 07.02.2017 um 11:13 hat Fam Zheng geschrieben:
> On Wed, 01/25 12:42, Jeff Cody wrote:
> > From: Kevin Wolf 
> > 
> > This splits the logic in the old parse_chap() function into a part that
> > parses the -iscsi options into the new driver-specific options, and
> > another part that actually applies those options (called apply_chap()
> > now).
> > 
> > Note that this means that username and password specified with -iscsi
> > only take effect when a URL is provided. This is intentional, -iscsi is
> > a legacy interface only supported for compatibility, new users should
> > use the proper driver-specific options.
> > 
> > Signed-off-by: Kevin Wolf 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/iscsi.c | 78 
> > +--
> >  1 file changed, 44 insertions(+), 34 deletions(-)
> > 
> > diff --git a/block/iscsi.c b/block/iscsi.c
> > index 97cff30..fc91d0f 100644
> > --- a/block/iscsi.c
> > +++ b/block/iscsi.c
> > @@ -1236,29 +1236,14 @@ retry:
> >  return 0;
> >  }
> >  
> > -static void parse_chap(struct iscsi_context *iscsi, const char *target,
> > +static void apply_chap(struct iscsi_context *iscsi, QemuOpts *opts,
> > Error **errp)
> >  {
> > -QemuOptsList *list;
> > -QemuOpts *opts;
> >  const char *user = NULL;
> >  const char *password = NULL;
> >  const char *secretid;
> >  char *secret = NULL;
> >  
> > -list = qemu_find_opts("iscsi");
> > -if (!list) {
> > -return;
> > -}
> > -
> > -opts = qemu_opts_find(list, target);
> > -if (opts == NULL) {
> > -opts = QTAILQ_FIRST(>head);
> > -if (!opts) {
> > -return;
> > -}
> > -}
> > -
> >  user = qemu_opt_get(opts, "user");
> >  if (!user) {
> >  return;
> > @@ -1587,6 +1572,41 @@ out:
> >  }
> >  }
> >  
> > +static void iscsi_parse_iscsi_option(const char *target, QDict *options)
> > +{
> > +QemuOptsList *list;
> > +QemuOpts *opts;
> > +const char *user, *password, *password_secret;
> > +
> > +list = qemu_find_opts("iscsi");
> > +if (!list) {
> > +return;
> > +}
> > +
> > +opts = qemu_opts_find(list, target);
> > +if (opts == NULL) {
> > +opts = QTAILQ_FIRST(>head);
> > +if (!opts) {
> > +return;
> > +}
> > +}
> > +
> > +user = qemu_opt_get(opts, "user");
> > +if (user) {
> > +qdict_set_default_str(options, "user", user);
> > +}
> > +
> > +password = qemu_opt_get(opts, "password");
> > +if (password) {
> > +qdict_set_default_str(options, "password", password);
> > +}
> > +
> > +password_secret = qemu_opt_get(opts, "password-secret");
> > +if (password_secret) {
> > +qdict_set_default_str(options, "password-secret", password_secret);
> > +}
> > +}
> > +
> >  /*
> >   * We support iscsi url's on the form
> >   * iscsi://[%@][:]//
> > @@ -1625,6 +1645,9 @@ static void iscsi_parse_filename(const char 
> > *filename, QDict *options,
> >  qdict_set_default_str(options, "lun", lun_str);
> >  g_free(lun_str);
> >  
> > +/* User/password from -iscsi take precedence over those from the URL */
> > +iscsi_parse_iscsi_option(iscsi_url->target, options);
> > +
> 
> Isn't more natural to let the local option take precedence over the global 
> one?

One would think so, but that's how the option work today, so we can't
change it without breaking compatibility. We can only newly define the
precedence of the new driver-specific options vs. the existing ones, and
there the local driver-specific ones do take precedence.

> >  if (iscsi_url->user[0] != '\0') {
> >  qdict_set_default_str(options, "user", iscsi_url->user);
> >  qdict_set_default_str(options, "password", iscsi_url->passwd);
> > @@ -1659,6 +1682,10 @@ static QemuOptsList runtime_opts = {
> >  .type = QEMU_OPT_STRING,
> >  },
> >  {
> > +.name = "password-secret",
> > +.type = QEMU_OPT_STRING,
> > +},
> > +{
> 
> I think this added password-secret is not the one looked up in
> iscsi_parse_iscsi_option(), which is from -iscsi
> (block/iscsi-opts.c:qemu_iscsi_opts). Is this intended? Or does it belong to a
> different patch?

It is the one that it put into the QDict by iscsi_parse_iscsi_option(),
which is supposed to be the value from -iscsi.

Why do you think it's not the one? Maybe I'm missing something.

Kevin



Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 13:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 17.02.2017 15:21, Kevin Wolf wrote:
> 
> Am 17.02.2017 um 13:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 
> 16.02.2017 15:47, Kevin Wolf wrote:
> 
> Sorry, this was sent too early. Next attempt...
> 
> Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben:
> 
> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy 
> geschrieben:
> 
> Auto loading bitmaps are bitmaps in Qcow2, with the AUTO 
> flag set. They
> are loaded when the image is opened and become 
> BdrvDirtyBitmaps for the
> corresponding drive.
> 
> Extra data in bitmaps is not supported for now.
> 
> [...]
> 
> 
> hdx.o vhdx-endian.o vhdx-log.o
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> new file mode 100644
> index 000..e08e46e
> --- /dev/null
> +++ b/block/qcow2-bitmap.c
> 
> [...]
> 
> 
> +
> +static int update_header_sync(BlockDriverState *bs)
> +{
> +int ret;
> +
> +ret = qcow2_update_header(bs);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* We don't return bdrv_flush error code. Even if it 
> fails, write was
> + * successful and it is more logical to consider 
> that header is in the new
> + * state than in the old.
> + */
> +ret = bdrv_flush(bs);
> +if (ret < 0) {
> +fprintf(stderr, "Failed to flush qcow2 header");
> +}
> 
> I don't think I can agree with this one. If you don't care 
> whether the
> new header is actually on disk, don't call bdrv_flush(). But if 
> you do
> care, then bdrv_flush() failure means that most likely the new 
> header
> has not made it to the disk, but is just sitting in some volatile 
> cache.
> 
> And what should be done on bdrv_flush fail? Current solution was
> proposed by Max.
> 
> Pass an error up and let the calling operation fail, because we can't
> promise that it actually executed correctly.
> 
> 
> It was my first option. The problem is that resulting state is absolutely
> inconsistent - it is not guaranteed  to be the old one or new.. However with
> current approach it may be inconsistent too, but without an error..
> 
> So, error message should contain information that all dirty bitmaps may be
> inconsistent even if in the image all flags says that bitmaps are OK.

After a failing flush, we're always in a state that is hard to predict
and where we can make little guarantees about what the image actually
contains on disk and what not. The only thing we can really do is to
make sure the user sees an error and can respond to it, rather than
being silent about a potential corruption.

> +return 0;
> +}
> +
> 
> [...]
> 
> 
> +
> +/* This function returns the number of disk sectors 
> covered by a single cluster
> + * of bitmap data. */
> +static uint64_t disk_sectors_in_bitmap_cluster(const 
> BDRVQcow2State *s,
> +   const 
> BdrvDirtyBitmap *bitmap)
> +{
> +uint32_t sector_granularity =
> +bdrv_dirty_bitmap_granularity(bitmap) >> 
> BDRV_SECTOR_BITS;
> +
> +return (uint64_t)sector_granularity * 
> (s->cluster_size << 3);
> +}
> 
> This has nothing to do with disk sectors, neither of the guest 
> disk nor
> of the host disk. It's just using a funny 512 bytes unit. Is 
> there a
> good reason for introducing this funny unit in new code?
> 
> I'm also not sure what this function calculates, but it's not 
> what the
> comment says. The unit of the result is something like sectors * 
> bytes,
> and even when normalising it to a single base unit, I've never 
> seen a
> use for square bytes so far.
> 
> sector granularity is number of disk sectors, corresponding to one
> bit of the dirty bitmap, (disk-sectors/bitmap-bit)
> 
> Please don't use the word "disk sectors", it's misleading because it's
> not a sector size of any specific disk. It's best to say just "sectors",
> which is a fixed qemu block 

Re: [Qemu-block] [PATCH v15 13/25] qcow2: add .bdrv_store_persistent_dirty_bitmaps()

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 13:24 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.02.2017 17:08, Kevin Wolf wrote:
> >Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>Realize block bitmap storing interface, to allow qcow2 images store
> >>persistent bitmaps.
> >>
> >>Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >>Reviewed-by: Max Reitz 
> >>Reviewed-by: John Snow 
> >Similary to autoload, I think this must be called as part of
> >qcow2_inactivate() rather than just in bdrv_close().
> >
> >Kevin
> 
> I prefer to store bitmaps once, on the final close of bds, and
> remove corresponding BdrvDirtyBitmap in the same point. bdrv_close
> is simpler point, I don't want to think about loading/saving bitmap
> on each invalidate/inactivate. I don't want to make dependencies
> between qcow2 bitmap loading/storing and migration, etc.
> 
> So, my approach was just load bitmap on bdrv_open and store on
> bdrv_close, and between these two calls BdrvDirtyBitmap lives its
> normal life. For me it looks simpler. I'm not sure about what new
> corner cases will come if we change this.

You make this sounds like invalidate/inactivate was a very frequent
operation. It's not. Essentially it is what you probably consider
open/close to be.

The semantics is basically that after inactivate (or more precisely, as
long as BDRV_O_INACTIVE is set), another process can access and
potentially modify the image file. At invalidate, qemu takes ownership
again and reloads everything from the image file.

Kevin



Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

17.02.2017 15:21, Kevin Wolf wrote:

Am 17.02.2017 um 13:07 hat Vladimir Sementsov-Ogievskiy geschrieben:

16.02.2017 15:47, Kevin Wolf wrote:

Sorry, this was sent too early. Next attempt...

Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They
are loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.

Extra data in bitmaps is not supported for now.

[...]


hdx.o vhdx-endian.o vhdx-log.o
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
new file mode 100644
index 000..e08e46e
--- /dev/null
+++ b/block/qcow2-bitmap.c

[...]


+
+static int update_header_sync(BlockDriverState *bs)
+{
+int ret;
+
+ret = qcow2_update_header(bs);
+if (ret < 0) {
+return ret;
+}
+
+/* We don't return bdrv_flush error code. Even if it fails, write was
+ * successful and it is more logical to consider that header is in the new
+ * state than in the old.
+ */
+ret = bdrv_flush(bs);
+if (ret < 0) {
+fprintf(stderr, "Failed to flush qcow2 header");
+}

I don't think I can agree with this one. If you don't care whether the
new header is actually on disk, don't call bdrv_flush(). But if you do
care, then bdrv_flush() failure means that most likely the new header
has not made it to the disk, but is just sitting in some volatile cache.

And what should be done on bdrv_flush fail? Current solution was
proposed by Max.

Pass an error up and let the calling operation fail, because we can't
promise that it actually executed correctly.


It was my first option. The problem is that resulting state is 
absolutely inconsistent - it is not guaranteed  to be the old one or 
new.. However with current approach it may be inconsistent too, but 
without an error..


So, error message should contain information that all dirty bitmaps may 
be inconsistent even if in the image all flags says that bitmaps are OK.






+return 0;
+}
+

[...]


+
+/* This function returns the number of disk sectors covered by a single cluster
+ * of bitmap data. */
+static uint64_t disk_sectors_in_bitmap_cluster(const BDRVQcow2State *s,
+   const BdrvDirtyBitmap *bitmap)
+{
+uint32_t sector_granularity =
+bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
+
+return (uint64_t)sector_granularity * (s->cluster_size << 3);
+}

This has nothing to do with disk sectors, neither of the guest disk nor
of the host disk. It's just using a funny 512 bytes unit. Is there a
good reason for introducing this funny unit in new code?

I'm also not sure what this function calculates, but it's not what the
comment says. The unit of the result is something like sectors * bytes,
and even when normalising it to a single base unit, I've never seen a
use for square bytes so far.

sector granularity is number of disk sectors, corresponding to one
bit of the dirty bitmap, (disk-sectors/bitmap-bit)

Please don't use the word "disk sectors", it's misleading because it's
not a sector size of any specific disk. It's best to say just "sectors",
which is a fixed qemu block layer unit of 512 bytes.


cluster_size << 3 is a number of bits in one cluster, (bitmap-bit)

so, we have
sector_granularity (disk-sector/bitmap-bit) * 
(bitmapbit)  = some disk sectors, corresponding to one cluster of
bitmap data.

Aha! I completely misunderstood what a bitmap cluster is supposed to
be. I expected it to be a chunk of bitmap data whose size corresponds to
the bitmap granularity, whereas it's really about qcow2 clusters.

I wonder if we can rephrase the comment to make this more obvious. Maybe
"...covered by a single qcow2 cluster containing bitmap data"? And the
function could be called sectors_covered_by_bitmap_cluster() or
something.


No problem with that, I'm always for better wording)

s/containing/of/ looks better for me, as it shouldbe covered by bitmap 
data, not by cluster.




Kevin



--
Best regards,
Vladimir



Re: [Qemu-block] [Qemu-devel] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Alberto Garcia
On Fri 17 Feb 2017 01:30:09 PM CET, Pankaj Gupta wrote:
> I think 'posix_madvise' was added for systems which didnot have
> 'madvise' [...] For the systems which don't have madvise call
> 'posix_madvise' is called which as per discussion is not right thing
> for 'DONTNEED' option. It will not give desired results.
>
> Either we have to find right alternative or else it is already broken
> for systems which don't support madvise.

Do you have an example of a call that is currently broken in the QEMU
code?

Berto



Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 17.02.2017 15:09, Kevin Wolf wrote:
> >Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>16.02.2017 14:49, Kevin Wolf wrote:
> >>>Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >Auto loading bitmaps are bitmaps stored in the disk image, which should
> >be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >corresponding drive.
> >
> >Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >Reviewed-by: John Snow 
> >Reviewed-by: Max Reitz 
> Why do we need a new BlockDriver callback and special code for it in
> bdrv_open_common()? The callback is only ever called immediately after
> .bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
> in their .bdrv_open implementation? Even more so because qcow2 is the
> only driver that supports this callback.
> >>>Actually, don't we have to call this in qcow2_invalidate_cache()?
> >>>Currently, I think, after a migration, the autoload bitmaps aren't
> >>>loaded.
> >>>
> >>>By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> >>>qcow2_open(), this would be fixed.
> >>>
> >>>Kevin
> >>Bitmap should not be reloaded on any intermediate qcow2-open's,
> >>reopens, etc. It should be loaded once, on bdrv_open, to not create
> >>extra collisions (between in-memory bitmap and it's stored version).
> >>That was the idea.
> >>
> >>For bitmaps migration there are separate series, we shouldn't load
> >>bitmap from file on migration, as it's version in the file is
> >>outdated.
> >That's not what your series is doing, though. It loads the bitmaps when
> 
> Actually, they will not be loaded as they will have IN_USE flag.
> 
> >migration starts and doesn't reload then when migration completes, even
> >though they are stale. Migration with shared storage would just work
> >without an extra series if you did these things in the correct places.
> >
> >As a reminder, this is how migration with shared storage works (or
> >should work with your series):
> >
> >1. Start destination qemu instance. This calls bdrv_open() with
> >BDRV_O_INACTIVE. We can read in some metadata, though we don't need
> >much more than the image size at this point. Writing to the image is
> >still impossible.
> >
> >2. Start migration on the source, while the VM is still writing to the
> >image, rendering the cached metadata from step 1 stale.
> >
> >3. Migration completes:
> >
> > a. Stop the VM
> >
> > b. Inactivate all images in the source qemu. This is where all
> >metadata needs to be written back to the image file, including
> >bitmaps. No writes to the image are possible after this point
> >because BDRV_O_INACTIVE is set.
> >
> > c. Invalidate the caches in the destination qemu, i.e. reload
> >everything from the file that could have changed since step 1,
> >including bitmaps. BDRV_O_INACTIVE is cleared, making the image
> >ready for writes.
> >
> > d. Resume the VM on the destination
> >
> >4. Exit the source qemu process, which involves bdrv_close(). Note that
> >at this point, no writing to the image file is possible any more,
> >it's the destination qemu process that own the image file now.
> >
> >Your series loads and stores bitmaps in steps 1 and 4. This means that
> 
> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
> it is checked), nothing is stored.
> 
> >they are stale on the destination when migration completes, and that
> >bdrv_close() wants to write to an image file that it doesn't own any
> >more, which will cause an assertion failure. If you instead move things
> >to steps 3b and 3c, it will just work.
> 
> Hmm, I understand the idea.. But this will interfere with postcopy
> bitmap migration. So if we really need this, there should be some
> additional control flags or capabilities.. The problem of your
> approach is that bitmap actually migrated in the short state when
> source and destination are stopped, it may take time, as bitmaps may
> be large.

You can always add optimisations, but this is the basic lifecycle
process of block devices in qemu, so it would be good to adhere to it.
So far there are no other pieces of information that are ignored in
bdrv_invalidate()/bdrv_inactivate() and instead only handled in
bdrv_open()/bdrv_close(). It's a matter of consistency, too.

And not having to add special cases for specific features in the generic
bdrv_open()/close() paths is a big plus for me anyway.

Kevin



Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

17.02.2017 15:09, Kevin Wolf wrote:

Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:

16.02.2017 14:49, Kevin Wolf wrote:

Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Auto loading bitmaps are bitmaps stored in the disk image, which should
be loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 

Why do we need a new BlockDriver callback and special code for it in
bdrv_open_common()? The callback is only ever called immediately after
.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
in their .bdrv_open implementation? Even more so because qcow2 is the
only driver that supports this callback.

Actually, don't we have to call this in qcow2_invalidate_cache()?
Currently, I think, after a migration, the autoload bitmaps aren't
loaded.

By moving the qcow2_load_autoloading_dirty_bitmaps() call to
qcow2_open(), this would be fixed.

Kevin

Bitmap should not be reloaded on any intermediate qcow2-open's,
reopens, etc. It should be loaded once, on bdrv_open, to not create
extra collisions (between in-memory bitmap and it's stored version).
That was the idea.

For bitmaps migration there are separate series, we shouldn't load
bitmap from file on migration, as it's version in the file is
outdated.

That's not what your series is doing, though. It loads the bitmaps when


Actually, they will not be loaded as they will have IN_USE flag.


migration starts and doesn't reload then when migration completes, even
though they are stale. Migration with shared storage would just work
without an extra series if you did these things in the correct places.

As a reminder, this is how migration with shared storage works (or
should work with your series):

1. Start destination qemu instance. This calls bdrv_open() with
BDRV_O_INACTIVE. We can read in some metadata, though we don't need
much more than the image size at this point. Writing to the image is
still impossible.

2. Start migration on the source, while the VM is still writing to the
image, rendering the cached metadata from step 1 stale.

3. Migration completes:

 a. Stop the VM

 b. Inactivate all images in the source qemu. This is where all
metadata needs to be written back to the image file, including
bitmaps. No writes to the image are possible after this point
because BDRV_O_INACTIVE is set.

 c. Invalidate the caches in the destination qemu, i.e. reload
everything from the file that could have changed since step 1,
including bitmaps. BDRV_O_INACTIVE is cleared, making the image
ready for writes.

 d. Resume the VM on the destination

4. Exit the source qemu process, which involves bdrv_close(). Note that
at this point, no writing to the image file is possible any more,
it's the destination qemu process that own the image file now.

Your series loads and stores bitmaps in steps 1 and 4. This means that


Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and it 
is checked), nothing is stored.



they are stale on the destination when migration completes, and that
bdrv_close() wants to write to an image file that it doesn't own any
more, which will cause an assertion failure. If you instead move things
to steps 3b and 3c, it will just work.


Hmm, I understand the idea.. But this will interfere with postcopy 
bitmap migration. So if we really need this, there should be some 
additional control flags or capabilities.. The problem of your approach 
is that bitmap actually migrated in the short state when source and 
destination are stopped, it may take time, as bitmaps may be large.




Kevin



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Alberto Garcia
On Fri 17 Feb 2017 12:30:28 PM CET, Pankaj Gupta wrote:
>> >  To maintain consistency at all the places use qemu_madvise wrapper
>> >  inplace of madvise call.
>> 
>> > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
>> > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
>> 
>> Those two calls are not equivalent, please see commit
>> 2f2c8d6b371cfc6689affb0b7e for an explanation.

> I don't understand why '2f2c8d6b371cfc6689affb0b7e' explicitly changed
> for :"#ifdef CONFIG_LINUX" I think its better to write generic
> function maybe in a wrapper then to conditionally set something at
> different places.

The problem with qemu_madvise(QEMU_MADV_DONTNEED) is that it can mean
different things depending on the platform:

   posix_madvise(POSIX_MADV_DONTNEED)
   madvise(MADV_DONTNEED)

The first call is standard but it doesn't do what we need, so we cannot
use it.

The second call -- madvise(MADV_DONTNEED) -- is not standard, and it
doesn't do the same in all platforms. The only platform in which it does
what we need is Linux, hence the #ifdef CONFIG_LINUX and #if
defined(__linux__) that you see in the code.

I agree with David's comment that maybe it's better to remove
QEMU_MADV_DONTNEED altogether since it's not reliable.

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Pankaj Gupta

> 
> * Pankaj Gupta (pagu...@redhat.com) wrote:
> > 
> > Thanks for your comments. I have below query.
> > > 
> > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote:
> > > >  To maintain consistency at all the places use qemu_madvise wrapper
> > > >  inplace of madvise call.
> > > 
> > > >  if (length > 0) {
> > > > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
> > > > +qemu_madvise((uint8_t *) t + offset, length,
> > > > QEMU_MADV_DONTNEED);
> > > 
> > > This was changed two months ago from qemu_madvise() to madvise(), is
> > > there any reason why you want to revert that change? Those two calls are
> > > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an
> > > explanation.
> > > 
> > > > -if (madvise(start, length, MADV_DONTNEED)) {
> > > > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) {
> > > >  error_report("%s MADV_DONTNEED: %s", __func__,
> > > >  strerror(errno));
> > 
> > I checked history of only change related to 'postcopy'.
> > 
> > For my linux machine:
> > 
> > ./config-host.mak
> > 
> > CONFIG_MADVISE=y
> > CONFIG_POSIX_MADVISE=y
> > 
> > As both these options are set for Linux, every time we call call
> > 'qemu_madvise' ==>"madvise(addr, len, advice);" will
> > be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e'
> > explicitly changed for :"#ifdef CONFIG_LINUX"
> > I think its better to write generic function maybe in a wrapper then to
> > conditionally set something at different places.
> 
> No; the problem is that the behaviours are different.
> You're right that the current build on Linux defines MADVISE and thus we are
> safe because qemu_madvise
> takes teh CONFIG_MADVISE/madvise route - but we need to be explicit that it's
> only
> the madvise() route that's safe, not any of the calls implemented by
> qemu_madvise, because if in the future someone was to rearrange qemu_madvise
> to prefer posix_madvise postcopy would break in a very subtle way.

Agree. 
We can add comment explaining this?

> 
> IMHO it might even be better to remove the definition of QEMU_MADV_DONTNEED
> altogether
> and make a name that wasn't ambiguous between the two, since the posix
> definition is
> so different.

I think 'posix_madvise' was added for systems which didnot have 'madvise'.
If I look at makefile, first we check what all calls are available and then 
set config option accordingly. We give 'madvise' precedence over 
'posix_madvise' 
if both are present. 

For the systems which don't have madvise call 'posix_madvise' is called which 
as per
discussion is not right thing for 'DONTNEED' option. It will not give desired 
results.

Either we have to find right alternative or else it is already broken for 
systems which
don't support madvise.

> 
> Dave
> 
> > int qemu_madvise(void *addr, size_t len, int advice)
> > {
> > if (advice == QEMU_MADV_INVALID) {
> > errno = EINVAL;
> > return -1;
> > }
> > #if defined(CONFIG_MADVISE)
> > return madvise(addr, len, advice);
> > #elif defined(CONFIG_POSIX_MADVISE)
> > return posix_madvise(addr, len, advice);
> > #else
> > errno = EINVAL;
> > return -1;
> > #endif
> > }
> > 
> > > 
> > > And this is the same case.
> > > 
> > > Berto
> > > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 



Re: [Qemu-block] [PATCH v15 13/25] qcow2: add .bdrv_store_persistent_dirty_bitmaps()

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

16.02.2017 17:08, Kevin Wolf wrote:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Realize block bitmap storing interface, to allow qcow2 images store
persistent bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 

Similary to autoload, I think this must be called as part of
qcow2_inactivate() rather than just in bdrv_close().

Kevin


I prefer to store bitmaps once, on the final close of bds, and remove 
corresponding BdrvDirtyBitmap in the same point. bdrv_close is simpler 
point, I don't want to think about loading/saving bitmap on each 
invalidate/inactivate. I don't want to make dependencies between qcow2 
bitmap loading/storing and migration, etc.


So, my approach was just load bitmap on bdrv_open and store on 
bdrv_close, and between these two calls BdrvDirtyBitmap lives its normal 
life. For me it looks simpler. I'm not sure about what new corner cases 
will come if we change this.



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 13:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.02.2017 15:47, Kevin Wolf wrote:
> >Sorry, this was sent too early. Next attempt...
> >
> >Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben:
> >>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They
> >>>are loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>>corresponding drive.
> >>>
> >>>Extra data in bitmaps is not supported for now.
> 
> [...]
> 
> >>>hdx.o vhdx-endian.o vhdx-log.o
> >>>diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> >>>new file mode 100644
> >>>index 000..e08e46e
> >>>--- /dev/null
> >>>+++ b/block/qcow2-bitmap.c
> 
> [...]
> 
> >>>+
> >>>+static int update_header_sync(BlockDriverState *bs)
> >>>+{
> >>>+int ret;
> >>>+
> >>>+ret = qcow2_update_header(bs);
> >>>+if (ret < 0) {
> >>>+return ret;
> >>>+}
> >>>+
> >>>+/* We don't return bdrv_flush error code. Even if it fails, write was
> >>>+ * successful and it is more logical to consider that header is in 
> >>>the new
> >>>+ * state than in the old.
> >>>+ */
> >>>+ret = bdrv_flush(bs);
> >>>+if (ret < 0) {
> >>>+fprintf(stderr, "Failed to flush qcow2 header");
> >>>+}
> >I don't think I can agree with this one. If you don't care whether the
> >new header is actually on disk, don't call bdrv_flush(). But if you do
> >care, then bdrv_flush() failure means that most likely the new header
> >has not made it to the disk, but is just sitting in some volatile cache.
> 
> And what should be done on bdrv_flush fail? Current solution was
> proposed by Max.

Pass an error up and let the calling operation fail, because we can't
promise that it actually executed correctly.

> >
> >>>+return 0;
> >>>+}
> >>>+
> 
> [...]
> 
> >>>+
> >>>+/* This function returns the number of disk sectors covered by a single 
> >>>cluster
> >>>+ * of bitmap data. */
> >>>+static uint64_t disk_sectors_in_bitmap_cluster(const BDRVQcow2State *s,
> >>>+   const BdrvDirtyBitmap 
> >>>*bitmap)
> >>>+{
> >>>+uint32_t sector_granularity =
> >>>+bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
> >>>+
> >>>+return (uint64_t)sector_granularity * (s->cluster_size << 3);
> >>>+}
> >This has nothing to do with disk sectors, neither of the guest disk nor
> >of the host disk. It's just using a funny 512 bytes unit. Is there a
> >good reason for introducing this funny unit in new code?
> >
> >I'm also not sure what this function calculates, but it's not what the
> >comment says. The unit of the result is something like sectors * bytes,
> >and even when normalising it to a single base unit, I've never seen a
> >use for square bytes so far.
> 
> sector granularity is number of disk sectors, corresponding to one
> bit of the dirty bitmap, (disk-sectors/bitmap-bit)

Please don't use the word "disk sectors", it's misleading because it's
not a sector size of any specific disk. It's best to say just "sectors",
which is a fixed qemu block layer unit of 512 bytes.

> cluster_size << 3 is a number of bits in one cluster, (bitmap-bit)
> 
> so, we have
> sector_granularity (disk-sector/bitmap-bit) * 
> (bitmapbit)  = some disk sectors, corresponding to one cluster of
> bitmap data.

Aha! I completely misunderstood what a bitmap cluster is supposed to
be. I expected it to be a chunk of bitmap data whose size corresponds to
the bitmap granularity, whereas it's really about qcow2 clusters.

I wonder if we can rephrase the comment to make this more obvious. Maybe
"...covered by a single qcow2 cluster containing bitmap data"? And the
function could be called sectors_covered_by_bitmap_cluster() or
something.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name

2017-02-17 Thread Fam Zheng
On Fri, 02/17 13:20, Vladimir Sementsov-Ogievskiy wrote:
> 16.02.2017 16:48, Fam Zheng wrote:
> > On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote:
> > > When testing migration, auto-generated by qemu node-names differs in
> > > source and destination qemu and migration fails. After this patch,
> > > auto-generated by iotest nodenames will be the same.
> > What should be done in libvirt to make sure the node-names are matching
> > correctly at both sides?
> 
> Hmm, just set node names appropriately?

But I think the problem is that node names are not configurable from libvirt
today, and then the migration will fail. Should the device name take precedence
in the code, to make it easier?

Fam



Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.02.2017 14:49, Kevin Wolf wrote:
> >Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> >>Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>Auto loading bitmaps are bitmaps stored in the disk image, which should
> >>>be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>>corresponding drive.
> >>>
> >>>Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >>>Reviewed-by: John Snow 
> >>>Reviewed-by: Max Reitz 
> >>Why do we need a new BlockDriver callback and special code for it in
> >>bdrv_open_common()? The callback is only ever called immediately after
> >>.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
> >>in their .bdrv_open implementation? Even more so because qcow2 is the
> >>only driver that supports this callback.
> >Actually, don't we have to call this in qcow2_invalidate_cache()?
> >Currently, I think, after a migration, the autoload bitmaps aren't
> >loaded.
> >
> >By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> >qcow2_open(), this would be fixed.
> >
> >Kevin
> 
> Bitmap should not be reloaded on any intermediate qcow2-open's,
> reopens, etc. It should be loaded once, on bdrv_open, to not create
> extra collisions (between in-memory bitmap and it's stored version).
> That was the idea.
> 
> For bitmaps migration there are separate series, we shouldn't load
> bitmap from file on migration, as it's version in the file is
> outdated.

That's not what your series is doing, though. It loads the bitmaps when
migration starts and doesn't reload then when migration completes, even
though they are stale. Migration with shared storage would just work
without an extra series if you did these things in the correct places.

As a reminder, this is how migration with shared storage works (or
should work with your series):

1. Start destination qemu instance. This calls bdrv_open() with
   BDRV_O_INACTIVE. We can read in some metadata, though we don't need
   much more than the image size at this point. Writing to the image is
   still impossible.

2. Start migration on the source, while the VM is still writing to the
   image, rendering the cached metadata from step 1 stale.

3. Migration completes:

a. Stop the VM

b. Inactivate all images in the source qemu. This is where all
   metadata needs to be written back to the image file, including
   bitmaps. No writes to the image are possible after this point
   because BDRV_O_INACTIVE is set.

c. Invalidate the caches in the destination qemu, i.e. reload
   everything from the file that could have changed since step 1,
   including bitmaps. BDRV_O_INACTIVE is cleared, making the image
   ready for writes.

d. Resume the VM on the destination

4. Exit the source qemu process, which involves bdrv_close(). Note that
   at this point, no writing to the image file is possible any more,
   it's the destination qemu process that own the image file now.

Your series loads and stores bitmaps in steps 1 and 4. This means that
they are stale on the destination when migration completes, and that
bdrv_close() wants to write to an image file that it doesn't own any
more, which will cause an assertion failure. If you instead move things
to steps 3b and 3c, it will just work.

Kevin



Re: [Qemu-block] [PATCH v15 09/25] qcow2: add .bdrv_load_autoloading_dirty_bitmaps

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

16.02.2017 15:47, Kevin Wolf wrote:

Sorry, this was sent too early. Next attempt...

Am 16.02.2017 um 12:45 hat Kevin Wolf geschrieben:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They
are loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.

Extra data in bitmaps is not supported for now.


[...]


hdx.o vhdx-endian.o vhdx-log.o
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
new file mode 100644
index 000..e08e46e
--- /dev/null
+++ b/block/qcow2-bitmap.c


[...]


+
+static int update_header_sync(BlockDriverState *bs)
+{
+int ret;
+
+ret = qcow2_update_header(bs);
+if (ret < 0) {
+return ret;
+}
+
+/* We don't return bdrv_flush error code. Even if it fails, write was
+ * successful and it is more logical to consider that header is in the new
+ * state than in the old.
+ */
+ret = bdrv_flush(bs);
+if (ret < 0) {
+fprintf(stderr, "Failed to flush qcow2 header");
+}

I don't think I can agree with this one. If you don't care whether the
new header is actually on disk, don't call bdrv_flush(). But if you do
care, then bdrv_flush() failure means that most likely the new header
has not made it to the disk, but is just sitting in some volatile cache.


And what should be done on bdrv_flush fail? Current solution was 
proposed by Max.





+return 0;
+}
+


[...]


+
+/* This function returns the number of disk sectors covered by a single cluster
+ * of bitmap data. */
+static uint64_t disk_sectors_in_bitmap_cluster(const BDRVQcow2State *s,
+   const BdrvDirtyBitmap *bitmap)
+{
+uint32_t sector_granularity =
+bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
+
+return (uint64_t)sector_granularity * (s->cluster_size << 3);
+}

This has nothing to do with disk sectors, neither of the guest disk nor
of the host disk. It's just using a funny 512 bytes unit. Is there a
good reason for introducing this funny unit in new code?

I'm also not sure what this function calculates, but it's not what the
comment says. The unit of the result is something like sectors * bytes,
and even when normalising it to a single base unit, I've never seen a
use for square bytes so far.


sector granularity is number of disk sectors, corresponding to one bit 
of the dirty bitmap, (disk-sectors/bitmap-bit)


cluster_size << 3 is a number of bits in one cluster, (bitmap-bit)

so, we have
sector_granularity (disk-sector/bitmap-bit) *  
(bitmapbit)  = some disk sectors, corresponding to one cluster of bitmap 
data.




--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

16.02.2017 14:49, Kevin Wolf wrote:

Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Auto loading bitmaps are bitmaps stored in the disk image, which should
be loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 

Why do we need a new BlockDriver callback and special code for it in
bdrv_open_common()? The callback is only ever called immediately after
.bdrv_open/.bdrv_file_open, so can't the drivers just do this internally
in their .bdrv_open implementation? Even more so because qcow2 is the
only driver that supports this callback.

Actually, don't we have to call this in qcow2_invalidate_cache()?
Currently, I think, after a migration, the autoload bitmaps aren't
loaded.

By moving the qcow2_load_autoloading_dirty_bitmaps() call to
qcow2_open(), this would be fixed.

Kevin


Bitmap should not be reloaded on any intermediate qcow2-open's, reopens, 
etc. It should be loaded once, on bdrv_open, to not create extra 
collisions (between in-memory bitmap and it's stored version). That was 
the idea.


For bitmaps migration there are separate series, we shouldn't load 
bitmap from file on migration, as it's version in the file is outdated.


,

--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Dr. David Alan Gilbert
* Pankaj Gupta (pagu...@redhat.com) wrote:
> 
> Thanks for your comments. I have below query.
> > 
> > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote:
> > >  To maintain consistency at all the places use qemu_madvise wrapper
> > >  inplace of madvise call.
> > 
> > >  if (length > 0) {
> > > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
> > > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
> > 
> > This was changed two months ago from qemu_madvise() to madvise(), is
> > there any reason why you want to revert that change? Those two calls are
> > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an
> > explanation.
> > 
> > > -if (madvise(start, length, MADV_DONTNEED)) {
> > > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) {
> > >  error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
> 
> I checked history of only change related to 'postcopy'.
> 
> For my linux machine:
> 
> ./config-host.mak
> 
> CONFIG_MADVISE=y
> CONFIG_POSIX_MADVISE=y
> 
> As both these options are set for Linux, every time we call call 
> 'qemu_madvise' ==>"madvise(addr, len, advice);" will 
> be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' 
> explicitly changed for :"#ifdef CONFIG_LINUX"
> I think its better to write generic function maybe in a wrapper then to 
> conditionally set something at different places.

No; the problem is that the behaviours are different.
You're right that the current build on Linux defines MADVISE and thus we are 
safe because qemu_madvise
takes teh CONFIG_MADVISE/madvise route - but we need to be explicit that it's 
only
the madvise() route that's safe, not any of the calls implemented by 
qemu_madvise, because if in the future someone was to rearrange qemu_madvise
to prefer posix_madvise postcopy would break in a very subtle way.

IMHO it might even be better to remove the definition of QEMU_MADV_DONTNEED 
altogether
and make a name that wasn't ambiguous between the two, since the posix 
definition is
so different.

Dave

> int qemu_madvise(void *addr, size_t len, int advice)
> {
> if (advice == QEMU_MADV_INVALID) {
> errno = EINVAL;
> return -1;
> }
> #if defined(CONFIG_MADVISE)
> return madvise(addr, len, advice);
> #elif defined(CONFIG_POSIX_MADVISE)
> return posix_madvise(addr, len, advice);
> #else
> errno = EINVAL;
> return -1;
> #endif
> }
> 
> > 
> > And this is the same case.
> > 
> > Berto
> > 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Pankaj Gupta

Thanks for your comments. I have below query.
> 
> On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote:
> >  To maintain consistency at all the places use qemu_madvise wrapper
> >  inplace of madvise call.
> 
> >  if (length > 0) {
> > -madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
> > +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
> 
> This was changed two months ago from qemu_madvise() to madvise(), is
> there any reason why you want to revert that change? Those two calls are
> not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an
> explanation.
> 
> > -if (madvise(start, length, MADV_DONTNEED)) {
> > +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) {
> >  error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));

I checked history of only change related to 'postcopy'.

For my linux machine:

./config-host.mak

CONFIG_MADVISE=y
CONFIG_POSIX_MADVISE=y

As both these options are set for Linux, every time we call call 'qemu_madvise' 
==>"madvise(addr, len, advice);" will 
be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' 
explicitly changed for :"#ifdef CONFIG_LINUX"
I think its better to write generic function maybe in a wrapper then to 
conditionally set something at different places.

int qemu_madvise(void *addr, size_t len, int advice)
{
if (advice == QEMU_MADV_INVALID) {
errno = EINVAL;
return -1;
}
#if defined(CONFIG_MADVISE)
return madvise(addr, len, advice);
#elif defined(CONFIG_POSIX_MADVISE)
return posix_madvise(addr, len, advice);
#else
errno = EINVAL;
return -1;
#endif
}

> 
> And this is the same case.
> 
> Berto
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 15/17] iotests: add default node-name

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

16.02.2017 16:48, Fam Zheng wrote:

On Mon, 02/13 12:54, Vladimir Sementsov-Ogievskiy wrote:

When testing migration, auto-generated by qemu node-names differs in
source and destination qemu and migration fails. After this patch,
auto-generated by iotest nodenames will be the same.

What should be done in libvirt to make sure the node-names are matching
correctly at both sides?


Hmm, just set node names appropriately?




Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
  tests/qemu-iotests/iotests.py | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index f5ca4b8..e110c90 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -168,6 +168,8 @@ class VM(qtest.QEMUQtestMachine):
  options.append('file=%s' % path)
  options.append('format=%s' % format)
  options.append('cache=%s' % cachemode)
+if 'node-name' not in opts:
+options.append('node-name=drivenode%d' % self._num_drives)
  
  if opts:

  options.append(opts)
--
1.8.3.1





--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v15 25/25] qcow2-bitmap: improve check_constraints_on_bitmap

2017-02-17 Thread Vladimir Sementsov-Ogievskiy

16.02.2017 17:21, Kevin Wolf wrote:

Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:

Add detailed error messages.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

Why not merge this patch into the one that originally introduced the
function?


Just to not create extra work for reviewers



Kevin



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Alberto Garcia
On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote:
>  To maintain consistency at all the places use qemu_madvise wrapper
>  inplace of madvise call.

>  if (length > 0) {
> -madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
> +qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);

This was changed two months ago from qemu_madvise() to madvise(), is
there any reason why you want to revert that change? Those two calls are
not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an
explanation.

> -if (madvise(start, length, MADV_DONTNEED)) {
> +if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) {
>  error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));

And this is the same case.

Berto



Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Dr. David Alan Gilbert
* Kevin Wolf (kw...@redhat.com) wrote:
> Am 17.02.2017 um 09:06 hat Pankaj Gupta geschrieben:
> >  To maintain consistency at all the places use qemu_madvise wrapper
> >  inplace of madvise call.
> > 
> > Signed-off-by: Pankaj Gupta 
> 
> Reviewed-by: Kevin Wolf 
> 
> Juan/Dave, if one of you can give an Acked-by, I can take this through
> my tree.

NACK

That's wrong; qemu_madvise can end up going through posix_madvise and
using POSIX_MADV_DONTNEED, it has different semantics to the 
madvise(MADV_DONTNEED)
and we need the semantics of madvise - i.e. it's guaranteed to throw
away the pages, where as posix_madvise *may* throw away the pages if
the kernel feels like it.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Kevin Wolf
Am 17.02.2017 um 09:06 hat Pankaj Gupta geschrieben:
>  To maintain consistency at all the places use qemu_madvise wrapper
>  inplace of madvise call.
> 
> Signed-off-by: Pankaj Gupta 

Reviewed-by: Kevin Wolf 

Juan/Dave, if one of you can give an Acked-by, I can take this through
my tree.

Kevin



[Qemu-block] [PATCH] block, migration: Use qemu_madvise inplace of madvise

2017-02-17 Thread Pankaj Gupta
 To maintain consistency at all the places use qemu_madvise wrapper
 inplace of madvise call.

Signed-off-by: Pankaj Gupta 
---
 block/qcow2-cache.c  | 2 +-
 migration/postcopy-ram.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147..4991ca5 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -74,7 +74,7 @@ static void qcow2_cache_table_release(BlockDriverState *bs, 
Qcow2Cache *c,
 size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
 size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
 if (length > 0) {
-madvise((uint8_t *) t + offset, length, MADV_DONTNEED);
+qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
 }
 #endif
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a40dddb..558fec1 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -213,7 +213,7 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, 
uint8_t *start,
size_t length)
 {
 trace_postcopy_ram_discard_range(start, length);
-if (madvise(start, length, MADV_DONTNEED)) {
+if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) {
 error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
 return -1;
 }
-- 
2.7.4