Re: [Qemu-block] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag

2017-08-04 Thread Eric Blake
On 08/01/2017 09:19 AM, Anton Nefedov wrote:
> The flag is supposed to indicate that the region of the disk image has
> to be sufficiently allocated so it reads as zeroes. The call with the flag
> set has to return -ENOTSUP if allocation cannot be done efficiently
> (i.e. without falling back to writing actual buffers)
> 
> Signed-off-by: Anton Nefedov 
> ---
>  include/block/block.h |  6 +-
>  include/block/block_int.h |  2 +-
>  block/io.c| 20 +---
>  3 files changed, 23 insertions(+), 5 deletions(-)

Reviewed-by: Eric Blake 

You might want the commit message to be a bit more verbose...

> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 7fe0125..828da67 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -65,9 +65,13 @@ typedef enum {
>  BDRV_REQ_NO_SERIALISING = 0x8,
>  BDRV_REQ_FUA= 0x10,
>  BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> +/* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
> + * efficiently allocate the space so it reads as zeroes, or return an 
> error.
> + */
> +BDRV_REQ_ALLOCATE   = 0x40,
>  
>  /* Mask of valid flags */
> -BDRV_REQ_MASK   = 0x3f,
> +BDRV_REQ_MASK   = 0x7f,
>  } BdrvRequestFlags;
>  
>  typedef struct BlockSizes {
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 9b94b32..9b64411 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -585,7 +585,7 @@ struct BlockDriverState {
>  /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>  unsigned int supported_write_flags;
>  /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
> - * BDRV_REQ_MAY_UNMAP) */
> + * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
>  unsigned int supported_zero_flags;

...in addition to adding the new flag here and documenting its semantics
for drivers...

>  
>  /* the following member gives a name to every node on the bs graph. */
> diff --git a/block/io.c b/block/io.c
> index 375fc66..04d495e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1245,7 +1245,7 @@ static int coroutine_fn 
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  assert(!bs->supported_zero_flags);
>  }
>  
> -if (ret == -ENOTSUP) {
> +if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
>  /* Fall back to bounce buffer if write zeroes is unsupported */
>  BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;

...you also made sure that anywhere the flag is in use you avoid a slow
fallback...

> @@ -1639,6 +1645,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild 
> *child, int64_t offset,
>  {
>  trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
>  
> +assert(!(flags & BDRV_REQ_MAY_UNMAP && flags & BDRV_REQ_ALLOCATE));
> +
> +if (flags & BDRV_REQ_ALLOCATE &&
> +!(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
> +{
> +return -ENOTSUP;

...as well as providing a sane default to make the flag always trigger
-ENOTSUP until individual drivers implement something in later patches.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] parallels: drop check that bdrv_truncate() is working

2017-08-04 Thread Denis V. Lunev
On 08/04/2017 10:32 PM, Eric Blake wrote:
> On 08/04/2017 10:10 AM, Denis V. Lunev wrote:
>> This would be actually strange and error prone. If truncate() nowadays
>> will fail, there is something fatally wrong. Let's check for that during
>> the actual work.
>>
>> The only fallback case is when the file is not zero initialized. In this
>> case we should switch to preallocation via fallocate().
> I got confused by the commit message.  Here's my attempt an an
> alternative, to see if I'm understanding the point of this patch:
>
> The code was trying to truncate a file to its current length, as an
> optimization to help decide whether an alternative prealloc mode was
> useful.  But it forgot to check whether bdrv_getlength() succeeded, and
> dealing with that failure just complicates what was supposed to be a
> no-op probe for optimizing later operation.  We will still properly fail
> later when an actual truncation attempt is made and the device can't
> support it.  So when deciding how to set prealloc_mode while opening the
> device, all we really need is to check just the one condition that
> matters - knowing whether the device is zero initialized.
>

This is not an optimization. This is check that truncate is working for
underlying BDS. 2 years ago I though that I have seen some BDSes
without truncate support.

So, with this patch I have dropped this check. If the user has specified
to change preallocation mode, he is responsible to be correct. If truncate
is broken, this will be revealed at the first attempt to expand the file.

>> Signed-off-by: Denis V. Lunev 
>> CC: Markus Armbruster 
>> CC: Kevin Wolf 
>> CC: Max Reitz 
>> CC: Stefan Hajnoczi 
>> ---
>>  block/parallels.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
> The code change makes sense at first glance, but I'm a bit reluctant to
> give R-b, since the commit message threw me off and I'm not familiar
> with the parallels code in the first place.
>
>> diff --git a/block/parallels.c b/block/parallels.c
>> index 6794e53c0b..e1e06d23cc 100644
>> --- a/block/parallels.c
>> +++ b/block/parallels.c
>> @@ -703,9 +703,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  goto fail_options;
>>  }
>>  
>> -if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
>> -bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs),
>> -  PREALLOC_MODE_OFF, NULL) != 0) {
>> +if (!bdrv_has_zero_init(bs->file->bs)) {
>>  s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
>>  }
>>  
>>





Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] parallels: drop check that bdrv_truncate() is working

2017-08-04 Thread Eric Blake
On 08/04/2017 10:10 AM, Denis V. Lunev wrote:
> This would be actually strange and error prone. If truncate() nowadays
> will fail, there is something fatally wrong. Let's check for that during
> the actual work.
> 
> The only fallback case is when the file is not zero initialized. In this
> case we should switch to preallocation via fallocate().

I got confused by the commit message.  Here's my attempt an an
alternative, to see if I'm understanding the point of this patch:

The code was trying to truncate a file to its current length, as an
optimization to help decide whether an alternative prealloc mode was
useful.  But it forgot to check whether bdrv_getlength() succeeded, and
dealing with that failure just complicates what was supposed to be a
no-op probe for optimizing later operation.  We will still properly fail
later when an actual truncation attempt is made and the device can't
support it.  So when deciding how to set prealloc_mode while opening the
device, all we really need is to check just the one condition that
matters - knowing whether the device is zero initialized.

> 
> Signed-off-by: Denis V. Lunev 
> CC: Markus Armbruster 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Stefan Hajnoczi 
> ---
>  block/parallels.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

The code change makes sense at first glance, but I'm a bit reluctant to
give R-b, since the commit message threw me off and I'm not familiar
with the parallels code in the first place.

> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 6794e53c0b..e1e06d23cc 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -703,9 +703,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto fail_options;
>  }
>  
> -if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
> -bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs),
> -  PREALLOC_MODE_OFF, NULL) != 0) {
> +if (!bdrv_has_zero_init(bs->file->bs)) {
>  s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
>  }
>  
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] block: respect error code from bdrv_getlength in handle_aiocb_write_zeroes

2017-08-04 Thread Eric Blake
On 08/04/2017 10:10 AM, Denis V. Lunev wrote:
> Original idea beyond the code in question was the following: we have failed
> to write zeroes with fallocate(FALLOC_FL_ZERO_RANGE) as the simplest
> approach and via fallocate(FALLOC_FL_PUNCH_HOLE)/fallocate(0). We have the
> only chance now: if the request comes beyond end of the file. Thus we
> should calculate file length and respect the error code from that op.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Markus Armbruster 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Stefan Hajnoczi 
> ---
>  block/file-posix.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] block: drop bdrv_set_key from BlockDriver

2017-08-04 Thread Eric Blake
On 08/04/2017 10:26 AM, Paolo Bonzini wrote:
> This is not used anymore since c01c214b69 ("block: remove all encryption
> handling APIs", 2017-07-11).
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/block/block_int.h | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index d4f4ea7584..7571c0aaaf 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -127,7 +127,6 @@ struct BlockDriver {
>Error **errp);
>  void (*bdrv_close)(BlockDriverState *bs);
>  int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
> -int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
>  int (*bdrv_make_empty)(BlockDriverState *bs);
>  
>  void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 05/17] block/nbd-client: get rid of ssize_t

2017-08-04 Thread Daniel P. Berrange
On Fri, Aug 04, 2017 at 06:14:28PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Use int variable for nbd_co_send_request return value (as
> nbd_co_send_request returns int).

Hmm, nbd_co_send_request() propagates return value of nbd_send_request,
which returns ssize_t.  IOW, I think we need to fix nbd_co_send_request
to also return ssize_t, rather than changing callers to use int.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10] vmdk: Fix error handling/reporting of vmdk_check

2017-08-04 Thread Eric Blake
On 08/04/2017 09:09 AM, Fam Zheng wrote:
> Errors from the callees must be captured and propagated to our caller,
> ensure this for both find_extent() and bdrv_getlength().
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 

> +if (ret == VMDK_OK) {
> +int64_t extent_len = bdrv_getlength(extent->file->bs);
> +if (extent_len < 0) {
> +fprintf(stderr,
> +"ERROR: could not get extent file length for sector 
> %"
> +PRId64 "\n", sector_num);
> +ret = extent_len;

Pre-existing - our use of fprintf() is not ideal.  But this patch
doesn't make it worse.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
On Fri, Aug 04, 2017 at 10:41:54AM -0500, Eric Blake wrote:
> On 08/04/2017 09:06 AM, Daniel P. Berrange wrote:
> > On Fri, Aug 04, 2017 at 04:02:10PM +0200, Kevin Wolf wrote:
> >> Am 04.08.2017 um 12:50 hat Daniel P. Berrange geschrieben:
> >>> Signed-off-by: Daniel P. Berrange 
> >>> ---
> 
> >> Really? We are asserting that they match in bdrv_aligned_preadv():
> >>
> >> assert(!qiov || bytes == qiov->size);
> > 
> > Hmm, why do we pass @bytes at all then ? If they're always the same,
> > how about deleting it and just letting everyone read qiov->size
> > directly.
> 
> Read the assertion again: qiov can be NULL (generally, when writing
> zeroes).  So we can't rely on qiov->size in that scenario.

This is odd.  In the bdrv_aligned_readv() it looks very much like
we'll reference qiov->niov, if bytes != 0, so if qiov was NULL we
would crash.

In bdrv_aligned_writev(), qiov->niov is also refernced if bytes != 0,
*unless*  flags contains BDRV_REQ_ZERO_WRITE, in which case we'll
invoke bdrv_co_do_pwrite_zeroes() instead.

So unless I'm missing something, bdrv_co_preadv|writev cannot be
called with a NULL  qiov, and bdrv_aligned_writev|readv might
need their assertions tightened up.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10] quorum: Handle bdrv_getlength() failures in quorum_co_flush()

2017-08-04 Thread Eric Blake
On 08/04/2017 09:08 AM, Alberto Garcia wrote:
> A bdrv_getlength() call can fail and return a negative value. This
> is not being handled in quorum_co_flush(), which can result in a
> QUORUM_REPORT_BAD event with an arbitrary value on the 'sectors-count'
> field.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Alberto Garcia 
> ---
>  block/quorum.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 55ba916655..d77991d680 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -785,8 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
> *bs)
>  for (i = 0; i < s->num_children; i++) {
>  result = bdrv_co_flush(s->children[i]->bs);
>  if (result) {
> +int64_t length = bdrv_getlength(s->children[i]->bs);
>  quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
> -  bdrv_getlength(s->children[i]->bs),
> +  length > 0 ? length : 0,

In the fallback case, is always picking 0 good enough?  Then again, this
is in the error path, so it is unlikely in practice, and I don't see any
better way to handle it.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake
On 08/04/2017 09:06 AM, Daniel P. Berrange wrote:
> On Fri, Aug 04, 2017 at 04:02:10PM +0200, Kevin Wolf wrote:
>> Am 04.08.2017 um 12:50 hat Daniel P. Berrange geschrieben:
>>> Signed-off-by: Daniel P. Berrange 
>>> ---

>> Really? We are asserting that they match in bdrv_aligned_preadv():
>>
>> assert(!qiov || bytes == qiov->size);
> 
> Hmm, why do we pass @bytes at all then ? If they're always the same,
> how about deleting it and just letting everyone read qiov->size
> directly.

Read the assertion again: qiov can be NULL (generally, when writing
zeroes).  So we can't rely on qiov->size in that scenario.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] Is blk_getlength() in find_image_format() and img_map() kosher?

2017-08-04 Thread Eric Blake
On 08/04/2017 08:55 AM, Markus Armbruster wrote:
> Have a look at find_image_format():
> 
> if (blk_is_sg(file) || !blk_is_inserted(file) || blk_getlength(file) == 
> 0) {
> *pdrv = _raw;
> return ret;
> }
> 
> blk_getlength() can fail.  Shouldn't we error out then?
> 
> We pretty obviously should in img_map():
> 
> length = blk_getlength(blk);
> while (curr.start + curr.length < length) {
> 
> Since I have to touch @length in the series I'm working on, I'll stick
> in a fix.

/me goes and checks my byte-based block status series

Hmm, that problem is still present even after my refactoring of
img_map().  I can rebase my series on top of your fix, as your fix
belongs in 2.10.

Alternatively, WHY do we allow blk_getlength() to fail?  Are there
really situations where we can open a BDS but not know its length?  I
know that we allow for online resizing detection, but presumably, either
we always know the (prior) size of the device, or we're going to have
problems talking to the device for anything beyond just a size request.
What semantics would change if we could guarantee that blk_getlength()
never failed?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] Is the use of bdrv_getlength() in qcow.c kosher?

2017-08-04 Thread Eric Blake
On 08/04/2017 07:45 AM, Markus Armbruster wrote:
> Markus Armbruster  writes:
> 
>> bdrv_getlength() can fail.  The uses in qcow.c don't check.  Is that
>> safe?
> 
> There's another one in qcow2_co_pwritev_compressed().
> 
> Yet another one in dump_refcounts().
> 
> While we're talking: the one in qcow2_measure() assigns to a variable of
> type ssize_t.  Should be int64_t.

Oh indeed.  I wonder if my recently-added qemu-iotests 190 will catch
that on a 32-bit machine?  If not, it should...

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE

2017-08-04 Thread Eric Blake
On 08/01/2017 09:19 AM, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov 
> ---
>  tests/qemu-iotests/190 | 146 
> +
>  tests/qemu-iotests/190.out |  50 
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 197 insertions(+)
>  create mode 100755 tests/qemu-iotests/190
>  create mode 100644 tests/qemu-iotests/190.out

190 already exists on mainline, you'll have to choose an available test
id that hasn't been mentioned on-list yet (yeah, our process for
reserving test ids is lousy. Max has already said he doesn't mind fixing
conflicts if it is just a duplicated test number, for qemu-iotests that
he manages)


> +_cleanup()
> +{
> + _cleanup_test_img
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15

This part won't be needed, if you land after Stefan's patch that adds
per-test directories with the ability to preserve temporary files

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 02/15] blkverify: set supported write/zero flags

2017-08-04 Thread Eric Blake
On 08/01/2017 09:18 AM, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov 
> ---
>  block/blkverify.c | 9 +
>  1 file changed, 9 insertions(+)

Basically, blkverify supports a flag if BOTH of its underlying files
also support the flag; if either side can't handle the flag, then we
fall back to emulation for both sides.

With more overhead, we COULD state that we support both bits if at least
one of the two underlying BDS supports the bit, and then emulate support
for the bit on the second BDS where it was lacking, so that at least the
first BDS doesn't suffer from the penalties of the fallbacks.  But that
means duplicating the block layer fallback code in blkverify, which is
already something that we don't necessarily expect high performance from.

For FUA, failure to implement the bit merely means that we have more
device-wide flush calls (instead of per-transaction mini-flushes), but
the end data should be the same.  But for MAY_UNMAP, I'm worried that we
may have situations where a plain BDS will create holes, while running
the same device paired through blkverify will fall back to slower
explicit zeroes.  I'm wondering whether this will bite us, if we have
scenarios where the mere fact of trying to verify block device behavior
changes what behavior we are even verifying.

Thus, while I think the code change _looks_ okay, I'm not sure if it is
correct design-wise, nor whether it is 2.10 material.

> +bs->supported_write_flags = BDRV_REQ_FUA &
> +bs->file->bs->supported_write_flags &
> +s->test_file->bs->supported_write_flags;
> +
> +bs->supported_zero_flags =
> +(BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +bs->file->bs->supported_zero_flags &
> +s->test_file->bs->supported_zero_flags;
> +
>  ret = 0;
>  fail:
>  qemu_opts_del(opts);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
On Fri, Aug 04, 2017 at 09:28:54AM -0500, Eric Blake wrote:
> On 08/04/2017 09:08 AM, Daniel P. Berrange wrote:
> > Signed-off-by: Daniel P. Berrange 
> > ---
> > 
> >  - Clarify that @bytes matches @qiov total size (Kevin)
> > 
> >  include/block/block_int.h | 31 +++
> >  1 file changed, 31 insertions(+)
> 
> [looks like the nongnu.org infrastructure is having heavy load today, so
> mails are getting through more slowly than usual - leads to lots of
> crossed emails, where I'm seeing replies or direct sends sooner than
> list copies]
> 
> > +/**
> > + * @offset: position in bytes to write at
> > + * @bytes: number of bytes to write
> > + * @qiov: the buffers containing data to write
> > + * @flags: zero or more of bits allowed by 'supported_write_flags'
> 
> maybe s/of //
> 
> > + *
> > + * @offset and @bytes will be a multiple of 'request_alignment',
> > + * but the length of individual @qiov elements does not have to
> > + * be a multiple.
> > + *
> > + * @bytes will always equal the total size of @qiov, and will be
> > + * no larger than 'max_transfer'.
> > + *
> > + * The buffer in @qiov may point directly to guest memory.
> > + */
> >  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
> >  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
> 
> Do we make guarantees that the driver callback is never reached if the
> image is currently read-only?  If so, is that a guarantee worth documenting?

bdrv_co_pwritev() returns EPERM if bs->read_only, so it looks like you
are right we have a guarantee we can document

> Reviewed-by: Eric Blake 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] [PATCH 15/17] block/nbd-client: refactor reading reply

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Read the whole reply in one place - in nbd_read_reply_entry.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.h |  1 +
 block/nbd-client.c | 27 +--
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index aa36be8950..0f84ccc073 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -30,6 +30,7 @@ typedef struct NBDClientSession {
 struct {
 Coroutine *co;
 NBDRequest *request;
+QEMUIOVector *qiov;
 } requests[MAX_NBD_REQUESTS];
 NBDReply reply;
 } NBDClientSession;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 0e12db4be3..61780c5df9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -94,6 +94,18 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 break;
 }
 
+if (s->reply.error == 0 &&
+s->requests[i].request->type == NBD_CMD_READ)
+{
+assert(s->requests[i].qiov != NULL);
+ret = nbd_rwv(s->ioc, s->requests[i].qiov->iov,
+  s->requests[i].qiov->niov,
+  s->requests[i].request->len, true, NULL);
+if (ret != s->requests[i].request->len) {
+break;
+}
+}
+
 /* We're woken up by the receiving coroutine itself.  Note that there
  * is no race between yielding and reentering read_reply_co.  This
  * is because:
@@ -138,6 +150,7 @@ static int nbd_co_request(BlockDriverState *bs,
 assert(i < MAX_NBD_REQUESTS);
 request->handle = INDEX_TO_HANDLE(s, i);
 s->requests[i].request = request;
+s->requests[i].qiov = qiov;
 
 if (!s->ioc) {
 qemu_co_mutex_unlock(>send_mutex);
@@ -165,12 +178,6 @@ static int nbd_co_request(BlockDriverState *bs,
 goto out;
 }
 
-if (request->type == NBD_CMD_READ) {
-assert(qiov != NULL);
-} else {
-qiov = NULL;
-}
-
 /* Wait until we're woken up by nbd_read_reply_entry.  */
 qemu_coroutine_yield();
 if (!s->ioc || s->reply.handle == 0) {
@@ -180,14 +187,6 @@ static int nbd_co_request(BlockDriverState *bs,
 
 assert(s->reply.handle == request->handle);
 
-if (qiov && s->reply.error == 0) {
-ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true, NULL);
-if (ret != request->len) {
-rc = -EIO;
-goto out;
-}
-}
-
 rc = -s->reply.error;
 
 out:
-- 
2.11.1




Re: [Qemu-block] [Qemu-devel] [PATCH v2 6/6] dev-storage: Fix the unusual function name

2017-08-04 Thread Philippe Mathieu-Daudé

On 08/04/2017 07:26 AM, Mao Zhongyi wrote:

The function name of usb_msd_{realize,unrealize}_*,
usb_msd_class_initfn_* are unusual. Rename it to
usb_msd_*_{realize,unrealize}, usb_msd_class_*_initfn.

Cc: Gerd Hoffmann 

Signed-off-by: Mao Zhongyi 


Reviewed-by: Philippe Mathieu-Daudé 


---
  hw/usb/dev-storage.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 801f552..2a05cd5 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -596,7 +596,7 @@ static void usb_msd_unrealize_storage(USBDevice *dev, Error 
**errp)
  object_unref(OBJECT(>bus));
  }
  
-static void usb_msd_realize_storage(USBDevice *dev, Error **errp)

+static void usb_msd_storage_realize(USBDevice *dev, Error **errp)
  {
  MSDState *s = USB_STORAGE_DEV(dev);
  BlockBackend *blk = s->conf.blk;
@@ -643,14 +643,14 @@ static void usb_msd_realize_storage(USBDevice *dev, Error 
**errp)
  s->scsi_dev = scsi_dev;
  }
  
-static void usb_msd_unrealize_bot(USBDevice *dev, Error **errp)

+static void usb_msd_bot_unrealize(USBDevice *dev, Error **errp)
  {
  MSDState *s = USB_STORAGE_DEV(dev);
  
  object_unref(OBJECT(>bus));

  }
  
-static void usb_msd_realize_bot(USBDevice *dev, Error **errp)

+static void usb_msd_bot_realize(USBDevice *dev, Error **errp)
  {
  MSDState *s = USB_STORAGE_DEV(dev);
  DeviceState *d = DEVICE(dev);
@@ -764,12 +764,12 @@ static void usb_msd_class_initfn_common(ObjectClass 
*klass, void *data)
  dc->vmsd = _usb_msd;
  }
  
-static void usb_msd_class_initfn_storage(ObjectClass *klass, void *data)

+static void usb_msd_class_storage_initfn(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
  USBDeviceClass *uc = USB_DEVICE_CLASS(klass);
  
-uc->realize = usb_msd_realize_storage;

+uc->realize = usb_msd_storage_realize;
  uc->unrealize = usb_msd_unrealize_storage;
  dc->props = msd_properties;
  }
@@ -828,26 +828,26 @@ static void usb_msd_instance_init(Object *obj)
  object_property_set_int(obj, -1, "bootindex", NULL);
  }
  
-static void usb_msd_class_initfn_bot(ObjectClass *klass, void *data)

+static void usb_msd_class_bot_initfn(ObjectClass *klass, void *data)
  {
  USBDeviceClass *uc = USB_DEVICE_CLASS(klass);
  
-uc->realize = usb_msd_realize_bot;

-uc->unrealize = usb_msd_unrealize_bot;
+uc->realize = usb_msd_bot_realize;
+uc->unrealize = usb_msd_bot_unrealize;
  uc->attached_settable = true;
  }
  
  static const TypeInfo msd_info = {

  .name  = "usb-storage",
  .parent= TYPE_USB_STORAGE,
-.class_init= usb_msd_class_initfn_storage,
+.class_init= usb_msd_class_storage_initfn,
  .instance_init = usb_msd_instance_init,
  };
  
  static const TypeInfo bot_info = {

  .name  = "usb-bot",
  .parent= TYPE_USB_STORAGE,
-.class_init= usb_msd_class_initfn_bot,
+.class_init= usb_msd_class_bot_initfn,
  };
  
  static void usb_msd_register_types(void)






[Qemu-block] [PATCH 08/17] block/nbd-client: rename nbd_recv_coroutines_enter_all

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Rename nbd_recv_coroutines_enter_all to nbd_recv_coroutines_wake_all,
as it most probably just add all recv coroutines into co_queue_wakeup,
not directly enter them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index c9ade9b517..8ad2264a40 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -34,7 +34,7 @@
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
 #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ ((uint64_t)(intptr_t)bs))
 
-static void nbd_recv_coroutines_enter_all(NBDClientSession *s)
+static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
 {
 int i;
 
@@ -108,7 +108,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 }
 
 s->reply.handle = 0;
-nbd_recv_coroutines_enter_all(s);
+nbd_recv_coroutines_wake_all(s);
 s->read_reply_co = NULL;
 }
 
-- 
2.11.1




[Qemu-block] [PATCH 14/17] block/nbd-client: exit reply-reading coroutine on incorrect handle

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Check reply-handle == request-handle in the same place, where
recv coroutine number is calculated from reply->handle and it's
correctness checked - in nbd_read_reply_entry.

Also finish nbd_read_reply_entry in case of reply-handle !=
request-handle in the same way as in case of incorrect reply-handle.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.h | 1 +
 block/nbd-client.c | 9 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 48e2559df6..aa36be8950 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -29,6 +29,7 @@ typedef struct NBDClientSession {
 
 struct {
 Coroutine *co;
+NBDRequest *request;
 } requests[MAX_NBD_REQUESTS];
 NBDReply reply;
 } NBDClientSession;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 5eb126c399..0e12db4be3 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -88,7 +88,9 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
  * one coroutine is called until the reply finishes.
  */
 i = HANDLE_TO_INDEX(s, s->reply.handle);
-if (i >= MAX_NBD_REQUESTS || !s->requests[i].co) {
+if (i >= MAX_NBD_REQUESTS || !s->requests[i].co ||
+s->reply.handle != s->requests[i].request->handle)
+{
 break;
 }
 
@@ -135,6 +137,7 @@ static int nbd_co_request(BlockDriverState *bs,
 g_assert(qemu_in_coroutine());
 assert(i < MAX_NBD_REQUESTS);
 request->handle = INDEX_TO_HANDLE(s, i);
+s->requests[i].request = request;
 
 if (!s->ioc) {
 qemu_co_mutex_unlock(>send_mutex);
@@ -170,11 +173,13 @@ static int nbd_co_request(BlockDriverState *bs,
 
 /* Wait until we're woken up by nbd_read_reply_entry.  */
 qemu_coroutine_yield();
-if (s->reply.handle != request->handle || !s->ioc) {
+if (!s->ioc || s->reply.handle == 0) {
 rc = -EIO;
 goto out;
 }
 
+assert(s->reply.handle == request->handle);
+
 if (qiov && s->reply.error == 0) {
 ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true, NULL);
 if (ret != request->len) {
-- 
2.11.1




[Qemu-block] [PATCH 01/17] nbd/client: fix nbd_opt_go

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Do not send NBD_OPT_ABORT to the broken server. After sending
NBD_REP_ACK on NBD_OPT_GO server is most probably in transmission
phase, when option sending is finished.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 0a17de80b5..f1c16b588f 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -399,12 +399,10 @@ static int nbd_opt_go(QIOChannel *ioc, const char 
*wantname,
phase, but make sure it sent flags */
 if (len) {
 error_setg(errp, "server sent invalid NBD_REP_ACK");
-nbd_send_opt_abort(ioc);
 return -1;
 }
 if (!info->flags) {
 error_setg(errp, "broken server omitted NBD_INFO_EXPORT");
-nbd_send_opt_abort(ioc);
 return -1;
 }
 trace_nbd_opt_go_success();
-- 
2.11.1




[Qemu-block] [PATCH 12/17] block/nbd-client: refactor nbd_co_request

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Reduce nesting, get rid of extra variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index b84cab4079..d6145c7db0 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -118,7 +118,6 @@ static int nbd_co_request(BlockDriverState *bs,
 {
 NBDClientSession *s = nbd_get_client_session(bs);
 int rc, ret, i;
-NBDReply reply;
 
 qemu_co_mutex_lock(>send_mutex);
 while (s->in_flight == MAX_NBD_REQUESTS) {
@@ -171,20 +170,20 @@ static int nbd_co_request(BlockDriverState *bs,
 
 /* Wait until we're woken up by nbd_read_reply_entry.  */
 qemu_coroutine_yield();
-reply = s->reply;
-if (reply.handle != request->handle ||
-!s->ioc) {
-reply.error = EIO;
-} else {
-if (qiov && reply.error == 0) {
-ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true,
-  NULL);
-if (ret != request->len) {
-reply.error = EIO;
-}
+if (s->reply.handle != request->handle || !s->ioc) {
+rc = -EIO;
+goto out;
+}
+
+if (qiov && s->reply.error == 0) {
+ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true, NULL);
+if (ret != request->len) {
+rc = -EIO;
+goto out;
 }
 }
-rc = -reply.error;
+
+rc = -s->reply.error;
 
 out:
 /* Tell the read handler to read another header.  */
-- 
2.11.1




[Qemu-block] [PATCH 03/17] nbd/client: refactor nbd_receive_reply

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Refactor nbd_receive_reply to return 1 on success, 0 on eof, when no
data was read and <0 for other cases, because returned size of read
data is not actually used.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  2 +-
 nbd/client.c| 12 +---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9c3d0a5868..f7450608b4 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -164,7 +164,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
  Error **errp);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
+int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 
diff --git a/nbd/client.c b/nbd/client.c
index 4556056daa..a1758a1931 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -914,11 +914,16 @@ ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest 
*request)
 return nbd_write(ioc, buf, sizeof(buf), NULL);
 }
 
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
+/* nbd_receive_reply
+ * Returns 1 on success
+ * 0 on eof, when no data was read from @ioc (errp is not set)
+ * < 0 on fail
+ */
+int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
 uint8_t buf[NBD_REPLY_SIZE];
 uint32_t magic;
-ssize_t ret;
+int ret;
 
 ret = nbd_read_eof(ioc, buf, sizeof(buf), errp);
 if (ret <= 0) {
@@ -948,6 +953,7 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, 
Error **errp)
 error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
 return -EINVAL;
 }
-return sizeof(buf);
+
+return 1;
 }
 
-- 
2.11.1




Re: [Qemu-block] [PATCH v2] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Jeff Cody
On Fri, Aug 04, 2017 at 10:36:58PM +0800, Fam Zheng wrote:
> It's not too surprising when a user specifies the backing file relative
> to the current working directory instead of the top layer image. This
> causes error when they differ. Though the error message has enough
> information to infer the fact about the misunderstanding, it is better
> if we document this explicitly, so that users don't have to learn from
> mistakes.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> v2: Improve wording. [Eric]
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/qemu-img.texi b/qemu-img.texi
> index 72dabd6b3e..90c7eab4a8 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size 
> needs to be specified in
>  this case. @var{backing_file} will never be modified unless you use the
>  @code{commit} monitor command (or qemu-img commit).
>  
> +If a relative path name is given, the backing file is looked up relative to
> +the directory containing @var{filename}.
> +
>  Note that a given backing file will be opened to check that it is valid. Use
>  the @code{-u} option to enable unsafe backing file mode, which means that the
>  image will be created even if the associated backing file cannot be opened. A
> @@ -343,6 +346,9 @@ created as a copy on write image of the specified base 
> image; the
>  @var{backing_file} should have the same content as the input's base image,
>  however the path, image format, etc may differ.
>  
> +If a relative path name is given, the backing file is looked up relative to
> +the directory containing @var{output_filename}.
> +
>  If the @code{-n} option is specified, the target volume creation will be
>  skipped. This is useful for formats such as @code{rbd} if the target
>  volume has already been created with site specific options that cannot
> @@ -490,6 +496,9 @@ The backing file is changed to @var{backing_file} and (if 
> the image format of
>  string), then the image is rebased onto no backing file (i.e. it will exist
>  independently of any backing file).
>  
> +If a relative path name is given, the backing file is looked up relative to
> +the directory containing @var{filename}.
> +
>  @var{cache} specifies the cache mode to be used for @var{filename}, whereas
>  @var{src_cache} specifies the cache mode for reading backing files.
>  
> -- 
> 2.13.3
> 
> 

Reviewed-by: Jeff Cody 




[Qemu-block] [PATCH 05/17] block/nbd-client: get rid of ssize_t

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Use int variable for nbd_co_send_request return value (as
nbd_co_send_request returns int).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 25dd28406b..dc19894a7c 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -214,7 +214,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t 
offset,
 .len = bytes,
 };
 NBDReply reply;
-ssize_t ret;
+int ret;
 
 assert(bytes <= NBD_MAX_BUFFER_SIZE);
 assert(!flags);
@@ -239,7 +239,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 .len = bytes,
 };
 NBDReply reply;
-ssize_t ret;
+int ret;
 
 if (flags & BDRV_REQ_FUA) {
 assert(client->info.flags & NBD_FLAG_SEND_FUA);
@@ -261,7 +261,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
 int bytes, BdrvRequestFlags flags)
 {
-ssize_t ret;
+int ret;
 NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = {
 .type = NBD_CMD_WRITE_ZEROES,
@@ -297,7 +297,7 @@ int nbd_client_co_flush(BlockDriverState *bs)
 NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = { .type = NBD_CMD_FLUSH };
 NBDReply reply;
-ssize_t ret;
+int ret;
 
 if (!(client->info.flags & NBD_FLAG_SEND_FLUSH)) {
 return 0;
@@ -325,7 +325,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int bytes)
 .len = bytes,
 };
 NBDReply reply;
-ssize_t ret;
+int ret;
 
 if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
 return 0;
-- 
2.11.1




[Qemu-block] [PATCH 16/17] block/nbd-client: drop reply field from NBDClientSession

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Drop 'reply' from NBDClientSession. It's usage is not very transparent:

1. it is used to deliver error to receiving coroutine, and receiving
   coroutine must save or handle it somehow and then zero out
   it in NBDClientSession.
2. it is used to inform receiving coroutines that nbd_read_reply_entry
   is out for some reason (error or disconnect)

To simplify this scheme:
- drop NBDClientSession.reply
- introduce NBDClientSession.requests[...].ret for (1)
- introduce NBDClientSession.eio_to_all for (2)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.h |  3 ++-
 block/nbd-client.c | 25 ++---
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 0f84ccc073..0b0aa67342 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -31,8 +31,9 @@ typedef struct NBDClientSession {
 Coroutine *co;
 NBDRequest *request;
 QEMUIOVector *qiov;
+int ret;
 } requests[MAX_NBD_REQUESTS];
-NBDReply reply;
+bool eio_to_all;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 61780c5df9..7c151b3dd3 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -72,10 +72,10 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 uint64_t i;
 int ret;
 Error *local_err = NULL;
+NBDReply reply;
 
 for (;;) {
-assert(s->reply.handle == 0);
-ret = nbd_receive_reply(s->ioc, >reply, _err);
+ret = nbd_receive_reply(s->ioc, , _err);
 if (ret < 0) {
 error_report_err(local_err);
 }
@@ -87,16 +87,14 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
  * handler acts as a synchronization point and ensures that only
  * one coroutine is called until the reply finishes.
  */
-i = HANDLE_TO_INDEX(s, s->reply.handle);
+i = HANDLE_TO_INDEX(s, reply.handle);
 if (i >= MAX_NBD_REQUESTS || !s->requests[i].co ||
-s->reply.handle != s->requests[i].request->handle)
+reply.handle != s->requests[i].request->handle)
 {
 break;
 }
 
-if (s->reply.error == 0 &&
-s->requests[i].request->type == NBD_CMD_READ)
-{
+if (reply.error == 0 && s->requests[i].request->type == NBD_CMD_READ) {
 assert(s->requests[i].qiov != NULL);
 ret = nbd_rwv(s->ioc, s->requests[i].qiov->iov,
   s->requests[i].qiov->niov,
@@ -106,6 +104,8 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 }
 }
 
+s->requests[i].ret = -reply.error;
+
 /* We're woken up by the receiving coroutine itself.  Note that there
  * is no race between yielding and reentering read_reply_co.  This
  * is because:
@@ -121,7 +121,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 qemu_coroutine_yield();
 }
 
-s->reply.handle = 0;
+s->eio_to_all = true;
 nbd_recv_coroutines_wake_all(s);
 s->read_reply_co = NULL;
 }
@@ -180,19 +180,14 @@ static int nbd_co_request(BlockDriverState *bs,
 
 /* Wait until we're woken up by nbd_read_reply_entry.  */
 qemu_coroutine_yield();
-if (!s->ioc || s->reply.handle == 0) {
+if (!s->ioc || s->eio_to_all) {
 rc = -EIO;
 goto out;
 }
 
-assert(s->reply.handle == request->handle);
-
-rc = -s->reply.error;
+rc = s->requests[i].ret;
 
 out:
-/* Tell the read handler to read another header.  */
-s->reply.handle = 0;
-
 s->requests[i].co = NULL;
 
 /* Kick the read_reply_co to get the next reply.  */
-- 
2.11.1




[Qemu-block] [PATCH 00/17] nbd client refactoring and fixing

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
A bit more refactoring and fixing before BLOCK_STATUS series.
I've tried to make individual patches simple enough, so there are
a lot of them.

Vladimir Sementsov-Ogievskiy (17):
  nbd/client: fix nbd_opt_go
  nbd/client: refactor nbd_read_eof
  nbd/client: refactor nbd_receive_reply
  nbd/client: fix nbd_send_request to return int
  block/nbd-client: get rid of ssize_t
  block/nbd-client: fix nbd_read_reply_entry
  block/nbd-client: refactor request send/receive
  block/nbd-client: rename nbd_recv_coroutines_enter_all
  block/nbd-client: move nbd_co_receive_reply content into
nbd_co_request
  block/nbd-client: move nbd_coroutine_end content into nbd_co_request
  block/nbd-client: fix nbd_co_request: set s->reply.handle to 0 on
error
  block/nbd-client: refactor nbd_co_request
  block/nbd-client: refactor NBDClientSession.recv_coroutine
  block/nbd-client: exit reply-reading coroutine on incorrect handle
  block/nbd-client: refactor reading reply
  block/nbd-client: drop reply field from NBDClientSession
  block/nbd-client: always return EIO on and after the first io channel
error

 block/nbd-client.h |   9 ++-
 include/block/nbd.h|   4 +-
 nbd/nbd-internal.h |  34 ++---
 block/nbd-client.c | 173 ++---
 nbd/client.c   |  21 +++---
 tests/qemu-iotests/083.out |   4 +-
 6 files changed, 115 insertions(+), 130 deletions(-)

-- 
2.11.1




[Qemu-block] [PATCH 06/17] block/nbd-client: fix nbd_read_reply_entry

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Set reply.handle to 0 on error path to prevent normal path of
nbd_co_receive_reply.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index dc19894a7c..0c88d84de6 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -107,6 +107,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 qemu_coroutine_yield();
 }
 
+s->reply.handle = 0;
 nbd_recv_coroutines_enter_all(s);
 s->read_reply_co = NULL;
 }
-- 
2.11.1




[Qemu-block] [PATCH 13/17] block/nbd-client: refactor NBDClientSession.recv_coroutine

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Move from recv_coroutine[i] to requests[i].co. This is needed for
further refactoring, new fields will be added to created structure.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.h |  4 +++-
 block/nbd-client.c | 20 ++--
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index df80771357..48e2559df6 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -27,7 +27,9 @@ typedef struct NBDClientSession {
 Coroutine *read_reply_co;
 int in_flight;
 
-Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
+struct {
+Coroutine *co;
+} requests[MAX_NBD_REQUESTS];
 NBDReply reply;
 } NBDClientSession;
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index d6145c7db0..5eb126c399 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -39,8 +39,8 @@ static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
 int i;
 
 for (i = 0; i < MAX_NBD_REQUESTS; i++) {
-if (s->recv_coroutine[i]) {
-aio_co_wake(s->recv_coroutine[i]);
+if (s->requests[i].co) {
+aio_co_wake(s->requests[i].co);
 }
 }
 }
@@ -88,22 +88,22 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
  * one coroutine is called until the reply finishes.
  */
 i = HANDLE_TO_INDEX(s, s->reply.handle);
-if (i >= MAX_NBD_REQUESTS || !s->recv_coroutine[i]) {
+if (i >= MAX_NBD_REQUESTS || !s->requests[i].co) {
 break;
 }
 
-/* We're woken up by the recv_coroutine itself.  Note that there
+/* We're woken up by the receiving coroutine itself.  Note that there
  * is no race between yielding and reentering read_reply_co.  This
  * is because:
  *
- * - if recv_coroutine[i] runs on the same AioContext, it is only
+ * - if requests[i].co runs on the same AioContext, it is only
  *   entered after we yield
  *
- * - if recv_coroutine[i] runs on a different AioContext, reentering
+ * - if requests[i].co runs on a different AioContext, reentering
  *   read_reply_co happens through a bottom half, which can only
  *   run after we yield.
  */
-aio_co_wake(s->recv_coroutine[i]);
+aio_co_wake(s->requests[i].co);
 qemu_coroutine_yield();
 }
 
@@ -126,8 +126,8 @@ static int nbd_co_request(BlockDriverState *bs,
 s->in_flight++;
 
 for (i = 0; i < MAX_NBD_REQUESTS; i++) {
-if (s->recv_coroutine[i] == NULL) {
-s->recv_coroutine[i] = qemu_coroutine_self();
+if (s->requests[i].co == NULL) {
+s->requests[i].co = qemu_coroutine_self();
 break;
 }
 }
@@ -189,7 +189,7 @@ out:
 /* Tell the read handler to read another header.  */
 s->reply.handle = 0;
 
-s->recv_coroutine[i] = NULL;
+s->requests[i].co = NULL;
 
 /* Kick the read_reply_co to get the next reply.  */
 if (s->read_reply_co) {
-- 
2.11.1




Re: [Qemu-block] [PATCH] block/null: Remove 'filename' option

2017-08-04 Thread Jeff Cody
On Fri, Aug 04, 2017 at 04:43:54PM +0200, Kevin Wolf wrote:
> This option was only added to allow 'null-co://' and 'null-aio://' as
> filenames, its value never served any actual purpose and was ignored.
> Nevertheless it was accepted as '-drive driver=null,filename=foo'.
> 
> The correct way to enable the protocol prefixes (and that without adding
> a useless -drive option) is implementing .bdrv_parse_filename. This is
> what this patch does.
> 
> Technically, this is an incompatible change, but the null block driver
> is only used for benchmarking, testing and debugging, and an option
> without effect isn't likely to be used by anyone anyway, so no bad
> effects are to be expected.
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Kevin Wolf 
> ---
>  block/null.c | 31 ++-
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/block/null.c b/block/null.c
> index 876f90965b..dd9c13f9ba 100644
> --- a/block/null.c
> +++ b/block/null.c
> @@ -30,11 +30,6 @@ static QemuOptsList runtime_opts = {
>  .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
>  .desc = {
>  {
> -.name = "filename",
> -.type = QEMU_OPT_STRING,
> -.help = "",
> -},
> -{
>  .name = BLOCK_OPT_SIZE,
>  .type = QEMU_OPT_SIZE,
>  .help = "size of the null block",
> @@ -54,6 +49,30 @@ static QemuOptsList runtime_opts = {
>  },
>  };
>  
> +static void null_co_parse_filename(const char *filename, QDict *options,
> +   Error **errp)
> +{
> +/* This functions only exists so that a null-co:// filename is accepted
> + * with the null-co driver. */
> +if (strcmp(filename, "null-co://")) {
> +error_setg(errp, "The only allowed filename for this driver is "
> + "'null-co://'");
> +return;
> +}
> +}
> +
> +static void null_aio_parse_filename(const char *filename, QDict *options,
> +Error **errp)
> +{
> +/* This functions only exists so that a null-aio:// filename is accepted
> + * with the null-aio driver. */
> +if (strcmp(filename, "null-aio://")) {
> +error_setg(errp, "The only allowed filename for this driver is "
> + "'null-aio://'");
> +return;
> +}
> +}
> +
>  static int null_file_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> @@ -242,6 +261,7 @@ static BlockDriver bdrv_null_co = {
>  .instance_size  = sizeof(BDRVNullState),
>  
>  .bdrv_file_open = null_file_open,
> +.bdrv_parse_filename= null_co_parse_filename,
>  .bdrv_close = null_close,
>  .bdrv_getlength = null_getlength,
>  
> @@ -261,6 +281,7 @@ static BlockDriver bdrv_null_aio = {
>  .instance_size  = sizeof(BDRVNullState),
>  
>  .bdrv_file_open = null_file_open,
> +.bdrv_parse_filename= null_aio_parse_filename,
>  .bdrv_close = null_close,
>  .bdrv_getlength = null_getlength,
>  
> -- 
> 2.13.3
> 
> 

Reviewed-by: Jeff Cody 




[Qemu-block] [PATCH 0/3] respect bdrv_getlength() error code

2017-08-04 Thread Denis V. Lunev
These cases were reported by Markus Armbruster 
Patches add error checking of the bdrv_getlength() call or remove
the call of that function.

Signed-off-by: Denis V. Lunev 
CC: Markus Armbruster 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Stefan Hajnoczi 




[Qemu-block] [PATCH 2/3] parallels: respect error code of bdrv_getlength() in allocate_clusters()

2017-08-04 Thread Denis V. Lunev
If we can not get the file length, the state of BDS is broken completely.
Return error to the caller.

Signed-off-by: Denis V. Lunev 
CC: Markus Armbruster 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 5bbdfabb7a..6794e53c0b 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -192,7 +192,7 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
  int nb_sectors, int *pnum)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t pos, space, idx, to_allocate, i;
+int64_t pos, space, idx, to_allocate, i, len;
 
 pos = block_status(s, sector_num, nb_sectors, pnum);
 if (pos > 0) {
@@ -214,7 +214,11 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
 assert(idx < s->bat_size && idx + to_allocate <= s->bat_size);
 
 space = to_allocate * s->tracks;
-if (s->data_end + space > bdrv_getlength(bs->file->bs) >> 
BDRV_SECTOR_BITS) {
+len = bdrv_getlength(bs->file->bs);
+if (len < 0) {
+return len;
+}
+if (s->data_end + space > (len >> BDRV_SECTOR_BITS)) {
 int ret;
 space += s->prealloc_size;
 if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
-- 
2.11.0




[Qemu-block] [PATCH] block: drop bdrv_set_key from BlockDriver

2017-08-04 Thread Paolo Bonzini
This is not used anymore since c01c214b69 ("block: remove all encryption
handling APIs", 2017-07-11).

Signed-off-by: Paolo Bonzini 
---
 include/block/block_int.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d4f4ea7584..7571c0aaaf 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -127,7 +127,6 @@ struct BlockDriver {
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
 int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
-int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
-- 
2.13.3




[Qemu-block] [PATCH 11/17] block/nbd-client: fix nbd_co_request: set s->reply.handle to 0 on error

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
We set s->reply.handle to 0 on one error path and don't set on another.
For consistancy and to avoid assert in nbd_read_reply_entry let's
set s->reply.handle to 0 in case of wrong handle too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index d6965d24db..b84cab4079 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -183,13 +183,13 @@ static int nbd_co_request(BlockDriverState *bs,
 reply.error = EIO;
 }
 }
-
-/* Tell the read handler to read another header.  */
-s->reply.handle = 0;
 }
 rc = -reply.error;
 
 out:
+/* Tell the read handler to read another header.  */
+s->reply.handle = 0;
+
 s->recv_coroutine[i] = NULL;
 
 /* Kick the read_reply_co to get the next reply.  */
-- 
2.11.1




[Qemu-block] [PATCH 02/17] nbd/client: refactor nbd_read_eof

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Refactor nbd_read_eof to return 1 on success, 0 on eof, when no
data was read and <0 for other cases, because returned size of
read data is not actually used.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/nbd-internal.h | 34 +-
 nbd/client.c   |  5 -
 tests/qemu-iotests/083.out |  4 ++--
 3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 396ddb4d3e..3fb0b6098a 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -77,21 +77,37 @@
 #define NBD_ESHUTDOWN  108
 
 /* nbd_read_eof
- * Tries to read @size bytes from @ioc. Returns number of bytes actually read.
- * May return a value >= 0 and < size only on EOF, i.e. when iteratively called
- * qio_channel_readv() returns 0. So, there is no need to call nbd_read_eof
- * iteratively.
+ * Tries to read @size bytes from @ioc.
+ * Returns 1 on success
+ * 0 on eof, when no data was read (errp is not set)
+ * -EINVAL on eof, when some data < @size was read until eof
+ * < 0 on read fail
  */
-static inline ssize_t nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
-   Error **errp)
+static inline int nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
+   Error **errp)
 {
 struct iovec iov = { .iov_base = buffer, .iov_len = size };
+ssize_t ret;
+
 /* Sockets are kept in blocking mode in the negotiation phase.  After
  * that, a non-readable socket simply means that another thread stole
  * our request/reply.  Synchronization is done with recv_coroutine, so
  * that this is coroutine-safe.
  */
-return nbd_rwv(ioc, , 1, size, true, errp);
+
+assert(size > 0);
+
+ret = nbd_rwv(ioc, , 1, size, true, errp);
+if (ret <= 0) {
+return ret;
+}
+
+if (ret != size) {
+error_setg(errp, "End of file");
+return -EINVAL;
+}
+
+return 1;
 }
 
 /* nbd_read
@@ -100,9 +116,9 @@ static inline ssize_t nbd_read_eof(QIOChannel *ioc, void 
*buffer, size_t size,
 static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
Error **errp)
 {
-ssize_t ret = nbd_read_eof(ioc, buffer, size, errp);
+int ret = nbd_read_eof(ioc, buffer, size, errp);
 
-if (ret >= 0 && ret != size) {
+if (ret == 0) {
 ret = -EINVAL;
 error_setg(errp, "End of file");
 }
diff --git a/nbd/client.c b/nbd/client.c
index f1c16b588f..4556056daa 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -925,11 +925,6 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply 
*reply, Error **errp)
 return ret;
 }
 
-if (ret != sizeof(buf)) {
-error_setg(errp, "read failed");
-return -EINVAL;
-}
-
 /* Reply
[ 0 ..  3]magic   (NBD_REPLY_MAGIC)
[ 4 ..  7]error   (0 == no error)
diff --git a/tests/qemu-iotests/083.out b/tests/qemu-iotests/083.out
index a24c6bfece..d3bea1b2f5 100644
--- a/tests/qemu-iotests/083.out
+++ b/tests/qemu-iotests/083.out
@@ -69,12 +69,12 @@ read failed: Input/output error
 
 === Check disconnect 4 reply ===
 
-read failed
+End of file
 read failed: Input/output error
 
 === Check disconnect 8 reply ===
 
-read failed
+End of file
 read failed: Input/output error
 
 === Check disconnect before data ===
-- 
2.11.1




[Qemu-block] [PATCH 07/17] block/nbd-client: refactor request send/receive

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Move nbd_co_receive_reply and nbd_coroutine_end calls into
nbd_co_send_request and rename the latter to just nbd_co_request.

This removes code duplications in nbd_client_co_{pwrite,pread,...}
functions. Also this is needed for further refactoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 89 --
 1 file changed, 33 insertions(+), 56 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 0c88d84de6..c9ade9b517 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -112,12 +112,20 @@ static coroutine_fn void nbd_read_reply_entry(void 
*opaque)
 s->read_reply_co = NULL;
 }
 
-static int nbd_co_send_request(BlockDriverState *bs,
-   NBDRequest *request,
-   QEMUIOVector *qiov)
+static void nbd_co_receive_reply(NBDClientSession *s,
+ NBDRequest *request,
+ NBDReply *reply,
+ QEMUIOVector *qiov);
+static void nbd_coroutine_end(BlockDriverState *bs,
+  NBDRequest *request);
+
+static int nbd_co_request(BlockDriverState *bs,
+  NBDRequest *request,
+  QEMUIOVector *qiov)
 {
 NBDClientSession *s = nbd_get_client_session(bs);
 int rc, ret, i;
+NBDReply reply;
 
 qemu_co_mutex_lock(>send_mutex);
 while (s->in_flight == MAX_NBD_REQUESTS) {
@@ -141,7 +149,8 @@ static int nbd_co_send_request(BlockDriverState *bs,
 return -EPIPE;
 }
 
-if (qiov) {
+if (request->type == NBD_CMD_WRITE) {
+assert(qiov != NULL);
 qio_channel_set_cork(s->ioc, true);
 rc = nbd_send_request(s->ioc, request);
 if (rc >= 0) {
@@ -156,6 +165,21 @@ static int nbd_co_send_request(BlockDriverState *bs,
 rc = nbd_send_request(s->ioc, request);
 }
 qemu_co_mutex_unlock(>send_mutex);
+
+if (rc < 0) {
+goto out;
+}
+
+if (request->type == NBD_CMD_READ) {
+assert(qiov != NULL);
+} else {
+qiov = NULL;
+}
+nbd_co_receive_reply(s, request, , qiov);
+rc = -reply.error;
+
+out:
+nbd_coroutine_end(bs, request);
 return rc;
 }
 
@@ -208,26 +232,16 @@ static void nbd_coroutine_end(BlockDriverState *bs,
 int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
-NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = {
 .type = NBD_CMD_READ,
 .from = offset,
 .len = bytes,
 };
-NBDReply reply;
-int ret;
 
 assert(bytes <= NBD_MAX_BUFFER_SIZE);
 assert(!flags);
 
-ret = nbd_co_send_request(bs, , NULL);
-if (ret < 0) {
-reply.error = -ret;
-} else {
-nbd_co_receive_reply(client, , , qiov);
-}
-nbd_coroutine_end(bs, );
-return -reply.error;
+return nbd_co_request(bs, , qiov);
 }
 
 int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
@@ -239,8 +253,6 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 .from = offset,
 .len = bytes,
 };
-NBDReply reply;
-int ret;
 
 if (flags & BDRV_REQ_FUA) {
 assert(client->info.flags & NBD_FLAG_SEND_FUA);
@@ -249,27 +261,18 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 
 assert(bytes <= NBD_MAX_BUFFER_SIZE);
 
-ret = nbd_co_send_request(bs, , qiov);
-if (ret < 0) {
-reply.error = -ret;
-} else {
-nbd_co_receive_reply(client, , , NULL);
-}
-nbd_coroutine_end(bs, );
-return -reply.error;
+return nbd_co_request(bs, , qiov);
 }
 
 int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
 int bytes, BdrvRequestFlags flags)
 {
-int ret;
 NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = {
 .type = NBD_CMD_WRITE_ZEROES,
 .from = offset,
 .len = bytes,
 };
-NBDReply reply;
 
 if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
 return -ENOTSUP;
@@ -283,22 +286,13 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset,
 request.flags |= NBD_CMD_FLAG_NO_HOLE;
 }
 
-ret = nbd_co_send_request(bs, , NULL);
-if (ret < 0) {
-reply.error = -ret;
-} else {
-nbd_co_receive_reply(client, , , NULL);
-}
-nbd_coroutine_end(bs, );
-return -reply.error;
+return nbd_co_request(bs, , NULL);
 }
 
 int nbd_client_co_flush(BlockDriverState *bs)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = { .type = NBD_CMD_FLUSH };
-NBDReply reply;
-int ret;
 
 if (!(client->info.flags & NBD_FLAG_SEND_FLUSH)) {
 return 0;
@@ -307,14 +301,7 @@ int 

[Qemu-block] [PATCH 04/17] nbd/client: fix nbd_send_request to return int

2017-08-04 Thread Vladimir Sementsov-Ogievskiy
Fix nbd_send_request to return int, as it returns a return value
of nbd_write (which is int), and the only user of nbd_send_request's
return value (nbd_co_send_request) consider it as int too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h | 2 +-
 nbd/client.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index f7450608b4..040cdd2e60 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -163,7 +163,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
   Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
  Error **errp);
-ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
diff --git a/nbd/client.c b/nbd/client.c
index a1758a1931..00cba45853 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -896,7 +896,7 @@ int nbd_disconnect(int fd)
 }
 #endif
 
-ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request)
+int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
 uint8_t buf[NBD_REQUEST_SIZE];
 
-- 
2.11.1




Re: [Qemu-block] [PATCH] block/null: Remove 'filename' option

2017-08-04 Thread Eric Blake
On 08/04/2017 09:43 AM, Kevin Wolf wrote:
> This option was only added to allow 'null-co://' and 'null-aio://' as
> filenames, its value never served any actual purpose and was ignored.
> Nevertheless it was accepted as '-drive driver=null,filename=foo'.
> 
> The correct way to enable the protocol prefixes (and that without adding
> a useless -drive option) is implementing .bdrv_parse_filename. This is
> what this patch does.
> 
> Technically, this is an incompatible change, but the null block driver
> is only used for benchmarking, testing and debugging, and an option
> without effect isn't likely to be used by anyone anyway, so no bad
> effects are to be expected.

Agreed with the analysis. Still, better to get it into 2.10 rather than
going yet another release with the option available.

> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Kevin Wolf 
> ---
>  block/null.c | 31 ++-
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?

2017-08-04 Thread Markus Armbruster
"Denis V. Lunev"  writes:

> On 08/04/2017 03:16 PM, Markus Armbruster wrote:
>> Denis, you added this in commit d50d822:
>>
>> #ifdef CONFIG_FALLOCATE
>> if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
>> int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
>> aiocb->aio_nbytes);
>> if (ret == 0 || ret != -ENOTSUP) {
>> return ret;
>> }
>> s->has_fallocate = false;
>> }
>> #endif
>>
>> bdrv_getlength() can fail.  Does it do the right thing then?  For what
>> it's worth, the comparison of its value is signed.
> fallocate() with 0 flags can work only beyond end of file
> or on top of the hole. Thus the check is made to validate
> that we are beyond EOF.
>
> Technically fallocate should fail if that condition will be
> violated. But you right, we can add sanity check here.
> This would not harm.
>
> Should I send it?

I figure an explicit check for bdrv_getlength() failure would make the
code easier to understand.  In short: yes, please!



[Qemu-block] [PATCH v2] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
Signed-off-by: Daniel P. Berrange 
---

Changed in v2:

 - Document 'flags' parameter too (Eric)

 include/block/block_int.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d4f4ea7584..4fe3081742 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -147,12 +147,43 @@ struct BlockDriver {
 
 int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+
+/**
+ * @offset: position in bytes to read at
+ * @bytes: number of bytes to read
+ * @qiov: the buffers to fill with read data
+ * @flags: currently unused, always 0
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes may be less than the total sizeof @iov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
+/**
+ * @offset: position in bytes to write at
+ * @bytes: number of bytes to write
+ * @qiov: the buffers containing data to write
+ * @flags: zero or more of bits allowed by 'supported_write_flags'
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes may be less than the total sizeof @iov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 
-- 
2.13.3




Re: [Qemu-block] [PATCH v4 for-2.10 01/15] mirror: inherit supported write/zero flags

2017-08-04 Thread Eric Blake
On 08/01/2017 09:18 AM, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov 
> ---
>  block/mirror.c | 5 +
>  1 file changed, 5 insertions(+)

I think this one counts as a bug fix worthy of inclusion in 2.10.

Reviewed-by: Eric Blake 

> 
> diff --git a/block/mirror.c b/block/mirror.c
> index d46dace..7e539f1 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1056,6 +1056,11 @@ static void 
> bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>  bdrv_refresh_filename(bs->backing->bs);
>  pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>  bs->backing->bs->filename);
> +bs->supported_write_flags = BDRV_REQ_FUA &
> +bs->backing->bs->supported_write_flags;
> +bs->supported_zero_flags =
> +(BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +bs->backing->bs->supported_zero_flags;
>  }
>  
>  static void bdrv_mirror_top_close(BlockDriverState *bs)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 3/3] parallels: drop check that bdrv_truncate() is working

2017-08-04 Thread Denis V. Lunev
This would be actually strange and error prone. If truncate() nowadays
will fail, there is something fatally wrong. Let's check for that during
the actual work.

The only fallback case is when the file is not zero initialized. In this
case we should switch to preallocation via fallocate().

Signed-off-by: Denis V. Lunev 
CC: Markus Armbruster 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Stefan Hajnoczi 
---
 block/parallels.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6794e53c0b..e1e06d23cc 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -703,9 +703,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail_options;
 }
 
-if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
-bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs),
-  PREALLOC_MODE_OFF, NULL) != 0) {
+if (!bdrv_has_zero_init(bs->file->bs)) {
 s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
 }
 
-- 
2.11.0




[Qemu-block] [PATCH 1/3] block: respect error code from bdrv_getlength in handle_aiocb_write_zeroes

2017-08-04 Thread Denis V. Lunev
Original idea beyond the code in question was the following: we have failed
to write zeroes with fallocate(FALLOC_FL_ZERO_RANGE) as the simplest
approach and via fallocate(FALLOC_FL_PUNCH_HOLE)/fallocate(0). We have the
only chance now: if the request comes beyond end of the file. Thus we
should calculate file length and respect the error code from that op.

Signed-off-by: Denis V. Lunev 
CC: Markus Armbruster 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Stefan Hajnoczi 
---
 block/file-posix.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index cfbb236f6f..f4de022ae0 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1339,6 +1339,9 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 #if defined(CONFIG_FALLOCATE) || defined(CONFIG_XFS)
 BDRVRawState *s = aiocb->bs->opaque;
 #endif
+#ifdef CONFIG_FALLOCATE
+int64_t len;
+#endif
 
 if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 return handle_aiocb_write_zeroes_block(aiocb);
@@ -1381,7 +1384,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 #endif
 
 #ifdef CONFIG_FALLOCATE
-if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
+/* Last resort: we are trying to extend the file with zeroed data. This
+ * can be done via fallocate(fd, 0) */
+len = bdrv_getlength(aiocb->bs);
+if (s->has_fallocate && len >= 0 && aiocb->aio_offset >= len) {
 int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
 if (ret == 0 || ret != -ENOTSUP) {
 return ret;
-- 
2.11.0




Re: [Qemu-block] Is the use of bdrv_getlength() in vhdx*.c kosher?

2017-08-04 Thread Jeff Cody
On Fri, Aug 04, 2017 at 02:49:42PM +0200, Markus Armbruster wrote:
> bdrv_getlength() can fail.  There are several calls in vhdx*.c that
> don't seem to check.  Bug or not?


Most definitely a bug.  Shall I queue up some patches, or do you already
have some?



[Qemu-block] [PATCH for-2.10] vmdk: Fix error handling/reporting of vmdk_check

2017-08-04 Thread Fam Zheng
Errors from the callees must be captured and propagated to our caller,
ensure this for both find_extent() and bdrv_getlength().

Reported-by: Markus Armbruster 
Signed-off-by: Fam Zheng 
---
 block/vmdk.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 0fc97391a6..c665bcc977 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2236,6 +2236,7 @@ static int vmdk_check(BlockDriverState *bs, 
BdrvCheckResult *result,
 fprintf(stderr,
 "ERROR: could not find extent for sector %" PRId64 "\n",
 sector_num);
+ret = -EINVAL;
 break;
 }
 ret = get_cluster_offset(bs, extent, NULL,
@@ -2247,19 +2248,28 @@ static int vmdk_check(BlockDriverState *bs, 
BdrvCheckResult *result,
 PRId64 "\n", sector_num);
 break;
 }
-if (ret == VMDK_OK &&
-cluster_offset >= bdrv_getlength(extent->file->bs))
-{
-fprintf(stderr,
-"ERROR: cluster offset for sector %"
-PRId64 " points after EOF\n", sector_num);
-break;
+if (ret == VMDK_OK) {
+int64_t extent_len = bdrv_getlength(extent->file->bs);
+if (extent_len < 0) {
+fprintf(stderr,
+"ERROR: could not get extent file length for sector %"
+PRId64 "\n", sector_num);
+ret = extent_len;
+break;
+}
+if (cluster_offset >= extent_len) {
+fprintf(stderr,
+"ERROR: cluster offset for sector %"
+PRId64 " points after EOF\n", sector_num);
+ret = -EINVAL;
+break;
+}
 }
 sector_num += extent->cluster_sectors;
 }
 
 result->corruptions++;
-return 0;
+return ret;
 }
 
 static ImageInfoSpecific *vmdk_get_specific_info(BlockDriverState *bs)
-- 
2.13.3




[Qemu-block] [PATCH v3] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
Signed-off-by: Daniel P. Berrange 
---

 - Clarify that @bytes matches @qiov total size (Kevin)

 include/block/block_int.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d4f4ea7584..4f7f20e56b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -147,12 +147,43 @@ struct BlockDriver {
 
 int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+
+/**
+ * @offset: position in bytes to read at
+ * @bytes: number of bytes to read
+ * @qiov: the buffers to fill with read data
+ * @flags: currently unused, always 0
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes will always equal the total size of @qiov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
+/**
+ * @offset: position in bytes to write at
+ * @bytes: number of bytes to write
+ * @qiov: the buffers containing data to write
+ * @flags: zero or more of bits allowed by 'supported_write_flags'
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes will always equal the total size of @qiov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 
-- 
2.13.3




Re: [Qemu-block] [PATCH for-2.10] block: move trace probes into bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake
On 08/04/2017 05:50 AM, Daniel P. Berrange wrote:
> There are trace probes in bdrv_co_readv|writev, however, the
> block drivers are being gradually moved over to using the
> bdrv_co_preadv|pwritev functions instead. As a result some
> block drivers miss the current probes. Move the probes
> into bdrv_co_preadv|pwritev instead, so that they are triggered
> by more (all?) I/O code paths.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/io.c | 8 
>  block/trace-events | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)

Arguably, the traces are broken, so this is a bug fix worthy of 2.10.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] quorum: Handle bdrv_getlength() failures in quorum_co_flush()

2017-08-04 Thread Alberto Garcia
A bdrv_getlength() call can fail and return a negative value. This
is not being handled in quorum_co_flush(), which can result in a
QUORUM_REPORT_BAD event with an arbitrary value on the 'sectors-count'
field.

Reported-by: Markus Armbruster 
Signed-off-by: Alberto Garcia 
---
 block/quorum.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 55ba916655..d77991d680 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -785,8 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
*bs)
 for (i = 0; i < s->num_children; i++) {
 result = bdrv_co_flush(s->children[i]->bs);
 if (result) {
+int64_t length = bdrv_getlength(s->children[i]->bs);
 quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
-  bdrv_getlength(s->children[i]->bs),
+  length > 0 ? length : 0,
   s->children[i]->bs->node_name, result);
 result_value.l = result;
 quorum_count_vote(_votes, _value, i);
-- 
2.11.0




Re: [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
On Fri, Aug 04, 2017 at 04:02:10PM +0200, Kevin Wolf wrote:
> Am 04.08.2017 um 12:50 hat Daniel P. Berrange geschrieben:
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  include/block/block_int.h | 29 +
> >  1 file changed, 29 insertions(+)
> > 
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index d4f4ea7584..deb81a58bd 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -147,12 +147,41 @@ struct BlockDriver {
> >  
> >  int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
> >  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> > +
> > +/**
> > + * @offset: position in bytes to read at
> > + * @bytes: number of bytes to read
> > + * @qiov: the buffers to fill with read data
> > + *
> > + * @offset and @bytes will be a multiple of 'request_alignment',
> > + * but the length of individual @qiov elements does not have to
> > + * be a multiple.
> > + *
> > + * @bytes may be less than the total sizeof @iov, and will be
> > + * no larger than 'max_transfer'.
> 
> Really? We are asserting that they match in bdrv_aligned_preadv():
> 
> assert(!qiov || bytes == qiov->size);

Hmm, why do we pass @bytes at all then ? If they're always the same,
how about deleting it and just letting everyone read qiov->size
directly.

> Also, s/sizeof @iov/size of @qiov/



> 
> > + *
> > + * The buffer in @qiov may point directly to guest memory.
> > + */
> >  int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
> >  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
> >  int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
> >  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> >  int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
> >  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
> > +/**
> > + * @offset: position in bytes to write at
> > + * @bytes: number of bytes to write
> > + * @qiov: the buffers containing data to write
> > + *
> > + * @offset and @bytes will be a multiple of 'request_alignment',
> > + * but the length of individual @qiov elements does not have to
> > + * be a multiple.
> > + *
> > + * @bytes may be less than the total sizeof @iov, and will be
> > + * no larger than 'max_transfer'.
> 
> The same assertion exists in bdrv_aligned_pwritev() (and the same typo
> in your comment).
> 
> > + * The buffer in @qiov may point directly to guest memory.
> > + */
> >  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
> >  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
> 
> Kevin

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] Unchecked blk_getlength() in device models and board code

2017-08-04 Thread Markus Armbruster
blk_getlength() can fail.  I figure the following need fixing:

hw/arm/musicpal.c: musicpal_init()
hw/block/nand.c: nand_realize()
hw/block/virtio-blk.c: virtio_blk_update_config()
hw/ppc/ppc405_boards.c: taihu_405ep_init()



Re: [Qemu-block] Is the use of bdrv_getlength() in quorum_co_flush() kosher?

2017-08-04 Thread Alberto Garcia
On Fri 04 Aug 2017 02:48:03 PM CEST, Markus Armbruster wrote:
> Have a look at quorum_co_flush():
>
> quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
>   bdrv_getlength(s->children[i]->bs),
>   s->children[i]->bs->node_name, result);
>
> bdrv_getlength() can fail.  Does it do the right thing then?

If it fails then it returns -errno, but then quorum_report_bad() turns
into uint64_t and assumes it's valid.

Since that number is then rounded up to the next multiple of
BDRV_SECTOR_SIZE in order to calculate end_sector, I think that what
happens in practice is that the user sees a QUORUM_REPORT_BAD event with
sectors-count = 0 (in most cases) or with a very high value in
sectors-count (if errno > BDRV_SECTOR_SIZE).

The result of bdrv_getlength() is only used to report the number of
affected sectors in the QUORUM_REPORT_BAD event, so there are no other
consequences.

Anyway I think it's a good idea not to make assumptions, detect the
error and pass 0 instead.

--- a/block/quorum.c
+++ b/block/quorum.c
@@ -785,8 +785,9 @@ static coroutine_fn int
quorum_co_flush(BlockDriverState *bs)
 for (i = 0; i < s->num_children; i++) {
 result = bdrv_co_flush(s->children[i]->bs);
 if (result) {
+int64_t length = bdrv_getlength(s->children[i]->bs);
 quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
-  bdrv_getlength(s->children[i]->bs),
+  length > 0 ? length : 0,
   s->children[i]->bs->node_name, result);
 result_value.l = result;
 quorum_count_vote(_votes, _value, i);

Berto



Re: [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Kevin Wolf
Am 04.08.2017 um 12:50 hat Daniel P. Berrange geschrieben:
> Signed-off-by: Daniel P. Berrange 
> ---
>  include/block/block_int.h | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index d4f4ea7584..deb81a58bd 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -147,12 +147,41 @@ struct BlockDriver {
>  
>  int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> +
> +/**
> + * @offset: position in bytes to read at
> + * @bytes: number of bytes to read
> + * @qiov: the buffers to fill with read data
> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes may be less than the total sizeof @iov, and will be
> + * no larger than 'max_transfer'.

Really? We are asserting that they match in bdrv_aligned_preadv():

assert(!qiov || bytes == qiov->size);

Also, s/sizeof @iov/size of @qiov/

> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
>  int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>  int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
> +/**
> + * @offset: position in bytes to write at
> + * @bytes: number of bytes to write
> + * @qiov: the buffers containing data to write
> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes may be less than the total sizeof @iov, and will be
> + * no larger than 'max_transfer'.

The same assertion exists in bdrv_aligned_pwritev() (and the same typo
in your comment).

> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Kevin



Re: [Qemu-block] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Daniel P. Berrange
On Fri, Aug 04, 2017 at 07:08:54AM -0500, Eric Blake wrote:
> On 08/04/2017 05:51 AM, Daniel P. Berrange wrote:
> > Using 16KB bounce buffers creates a significant performance
> > penalty for I/O to encrypted volumes on storage with high
> > I/O latency (rotating rust & network drives), because it
> > triggers lots of fairly small I/O operations.
> > 
> > On tests with rotating rust, and cache=none|directsync,
> > write speed increased from 2MiB/s to 32MiB/s, on a par
> > with that achieved by the in-kernel luks driver.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  block/crypto.c | 12 +---
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 58ef6f2f52..207941db9a 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> > @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
> >  }
> >  
> >  
> > -#define BLOCK_CRYPTO_MAX_SECTORS 32
> > +#define BLOCK_CRYPTO_MAX_SECTORS 2048
> >  
> >  static coroutine_fn int
> >  block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
> > @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
> > sector_num,
> >  
> >  qemu_iovec_init(_qiov, qiov->niov);
> >  
> > -/* Bounce buffer so we have a linear mem region for
> > - * entire sector. XXX optimize so we avoid bounce
> > - * buffer in case that qiov->niov == 1
> > +/* Bounce buffer because we're not permitted to touch
> > + * contents of qiov - it points to guest memory.
> 
> The comment updates are accurate (and in line with your other patch for
> improving documentation of the callback semantics), but slightly
> unrelated to the fix at hand. However, I have no problem keeping it in
> the patch.
> 
> (To make sure I understand the importance of the bounce buffer: On
> reads, we can't store into the buffer until we have decrypted, so that
> the guest can't transiently spy on the encrypted form; on writes, we
> must read from the buffer at most once before encrypting, so that the
> guest can't change the buffer under our feet while we are encrypting.)

For writes it is even more critical - if we encrypted in place, and the
guest tried to serve a later read from its cache, it'd return cipher
text instead of plain text.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] Is the use of bdrv_getlength() in qcow.c kosher?

2017-08-04 Thread Markus Armbruster
bdrv_getlength() can fail.  The uses in qcow.c don't check.  Is that
safe?



Re: [Qemu-block] [PATCH v3 3/7] block: tidy ThrottleGroupMember initializations

2017-08-04 Thread Alberto Garcia
On Mon 31 Jul 2017 11:54:39 AM CEST, Manos Pitsidianakis wrote:
> Move the CoMutex and CoQueue inits inside throttle_group_register_tgm()
> which is called whenever a ThrottleGroupMember is initialized. There's
> no need for them to be separate.
>
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Manos Pitsidianakis 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] Is the use of bdrv_getlength() in qcow.c kosher?

2017-08-04 Thread Markus Armbruster
Markus Armbruster  writes:

> bdrv_getlength() can fail.  The uses in qcow.c don't check.  Is that
> safe?

There's another one in qcow2_co_pwritev_compressed().

Yet another one in dump_refcounts().

While we're talking: the one in qcow2_measure() assigns to a variable of
type ssize_t.  Should be int64_t.



[Qemu-block] Is the use of bdrv_getlength() in vmdk_check() kosher?

2017-08-04 Thread Markus Armbruster
Have a look at vmdk_check():

if (ret == VMDK_OK &&
cluster_offset >= bdrv_getlength(extent->file->bs))

bdrv_getlength() can fail.  Does it do the right thing then?  For what
it's worth, the comparison of its value is unsigned.



[Qemu-block] Is the use of bdrv_getlength() in quorum_co_flush() kosher?

2017-08-04 Thread Markus Armbruster
Have a look at quorum_co_flush():

quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
  bdrv_getlength(s->children[i]->bs),
  s->children[i]->bs->node_name, result);

bdrv_getlength() can fail.  Does it do the right thing then?



Re: [Qemu-block] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Stefan Hajnoczi
On Fri, Aug 04, 2017 at 11:51:36AM +0100, Daniel P. Berrange wrote:
> Using 16KB bounce buffers creates a significant performance
> penalty for I/O to encrypted volumes on storage with high
> I/O latency (rotating rust & network drives), because it
> triggers lots of fairly small I/O operations.
> 
> On tests with rotating rust, and cache=none|directsync,
> write speed increased from 2MiB/s to 32MiB/s, on a par
> with that achieved by the in-kernel luks driver.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 58ef6f2f52..207941db9a 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
>  }
>  
>  
> -#define BLOCK_CRYPTO_MAX_SECTORS 32
> +#define BLOCK_CRYPTO_MAX_SECTORS 2048
>  
>  static coroutine_fn int
>  block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
> @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
> sector_num,
>  
>  qemu_iovec_init(_qiov, qiov->niov);
>  
> -/* Bounce buffer so we have a linear mem region for
> - * entire sector. XXX optimize so we avoid bounce
> - * buffer in case that qiov->niov == 1
> +/* Bounce buffer because we're not permitted to touch
> + * contents of qiov - it points to guest memory.
>   */
>  cipher_data =
>  qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,

In the *read* case you can modify the data buffers in-place.  But the
guest might see intermediate states in its buffers - not sure whether
this could pose a security problem.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake
On 08/04/2017 05:50 AM, Daniel P. Berrange wrote:
> Signed-off-by: Daniel P. Berrange 
> ---
>  include/block/block_int.h | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index d4f4ea7584..deb81a58bd 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -147,12 +147,41 @@ struct BlockDriver {
>  
>  int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> +
> +/**
> + * @offset: position in bytes to read at
> + * @bytes: number of bytes to read
> + * @qiov: the buffers to fill with read data
> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes may be less than the total sizeof @iov, and will be
> + * no larger than 'max_transfer'.
> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Documenting flags would also be nice (at the moment, flags is always 0)

>  int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>  int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
>  int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
> +/**
> + * @offset: position in bytes to write at
> + * @bytes: number of bytes to write
> + * @qiov: the buffers containing data to write
> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes may be less than the total sizeof @iov, and will be
> + * no larger than 'max_transfer'.
> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Likewise for documentation, but here, flags will never exceed
bs->supported_write_flags.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10 2/5] block: Allow reopen rw without BDRV_O_ALLOW_RDWR

2017-08-04 Thread Eric Blake
On 08/04/2017 05:20 AM, Kevin Wolf wrote:
>>>
>>> Hmm, I wonder.  https://bugzilla.redhat.com/show_bug.cgi?id=1465320
>>> details a failure when starting qemu with a read-write NBD disk, then
>>> taking several snapshots (nbd <- snap1 <- snap2 <- snap3), then where
>>> intermediate commit (snap2 into nbd) works but live commit (snap3 into
>>> nbd) fails with a message that nbd does not support reopening.  I'm
>>> presuming that your series may help to address that; I'll give it a spin
>>> and see what happens.
>>
>> Nope, even with your patches, I'm still getting:
>>
>> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar2'}}
>> {"return": {}}
>> {"timestamp": {"seconds": 1501811285, "microseconds": 439748}, "event":
>> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-image1", "len":
>> 2097152, "offset": 2097152, "speed": 0, "type": "commit"}}
>>
>> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar3'}}
>> {"error": {"class": "GenericError", "desc": "Block format 'nbd' used by
>> node '#block048' does not support reopening files"}}
> 
> That's simply NBD not implementing .bdrv_reopen_*. In other words, not a
> bug, but just a missing feature.

But why does intermediate commit work, while live commit fails?  Both
require that the image being committed into be read-write, and the NBD
image was opened read-write (even if it can be treated as read-only for
the duration of time that it is a backing image).  I understand missing
.bdrv_reopen making it impossible to change read-only to read-write, but
when missing it means you can never change read-write down to the
tighter read-only originally, then what is making live commit different
from intermediate in not being able to handle it?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Eric Blake
On 08/04/2017 09:36 AM, Fam Zheng wrote:
> It's not too surprising when a user specifies the backing file relative
> to the current working directory instead of the top layer image. This
> causes error when they differ. Though the error message has enough
> information to infer the fact about the misunderstanding, it is better
> if we document this explicitly, so that users don't have to learn from
> mistakes.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> v2: Improve wording. [Eric]
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] block/null: Remove 'filename' option

2017-08-04 Thread Kevin Wolf
This option was only added to allow 'null-co://' and 'null-aio://' as
filenames, its value never served any actual purpose and was ignored.
Nevertheless it was accepted as '-drive driver=null,filename=foo'.

The correct way to enable the protocol prefixes (and that without adding
a useless -drive option) is implementing .bdrv_parse_filename. This is
what this patch does.

Technically, this is an incompatible change, but the null block driver
is only used for benchmarking, testing and debugging, and an option
without effect isn't likely to be used by anyone anyway, so no bad
effects are to be expected.

Reported-by: Markus Armbruster 
Signed-off-by: Kevin Wolf 
---
 block/null.c | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/block/null.c b/block/null.c
index 876f90965b..dd9c13f9ba 100644
--- a/block/null.c
+++ b/block/null.c
@@ -30,11 +30,6 @@ static QemuOptsList runtime_opts = {
 .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
 .desc = {
 {
-.name = "filename",
-.type = QEMU_OPT_STRING,
-.help = "",
-},
-{
 .name = BLOCK_OPT_SIZE,
 .type = QEMU_OPT_SIZE,
 .help = "size of the null block",
@@ -54,6 +49,30 @@ static QemuOptsList runtime_opts = {
 },
 };
 
+static void null_co_parse_filename(const char *filename, QDict *options,
+   Error **errp)
+{
+/* This functions only exists so that a null-co:// filename is accepted
+ * with the null-co driver. */
+if (strcmp(filename, "null-co://")) {
+error_setg(errp, "The only allowed filename for this driver is "
+ "'null-co://'");
+return;
+}
+}
+
+static void null_aio_parse_filename(const char *filename, QDict *options,
+Error **errp)
+{
+/* This functions only exists so that a null-aio:// filename is accepted
+ * with the null-aio driver. */
+if (strcmp(filename, "null-aio://")) {
+error_setg(errp, "The only allowed filename for this driver is "
+ "'null-aio://'");
+return;
+}
+}
+
 static int null_file_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -242,6 +261,7 @@ static BlockDriver bdrv_null_co = {
 .instance_size  = sizeof(BDRVNullState),
 
 .bdrv_file_open = null_file_open,
+.bdrv_parse_filename= null_co_parse_filename,
 .bdrv_close = null_close,
 .bdrv_getlength = null_getlength,
 
@@ -261,6 +281,7 @@ static BlockDriver bdrv_null_aio = {
 .instance_size  = sizeof(BDRVNullState),
 
 .bdrv_file_open = null_file_open,
+.bdrv_parse_filename= null_aio_parse_filename,
 .bdrv_close = null_close,
 .bdrv_getlength = null_getlength,
 
-- 
2.13.3




[Qemu-block] [PATCH v2] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Fam Zheng
It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.

Signed-off-by: Fam Zheng 

---
v2: Improve wording. [Eric]
---
 qemu-img.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index 72dabd6b3e..90c7eab4a8 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size needs 
to be specified in
 this case. @var{backing_file} will never be modified unless you use the
 @code{commit} monitor command (or qemu-img commit).
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{filename}.
+
 Note that a given backing file will be opened to check that it is valid. Use
 the @code{-u} option to enable unsafe backing file mode, which means that the
 image will be created even if the associated backing file cannot be opened. A
@@ -343,6 +346,9 @@ created as a copy on write image of the specified base 
image; the
 @var{backing_file} should have the same content as the input's base image,
 however the path, image format, etc may differ.
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{output_filename}.
+
 If the @code{-n} option is specified, the target volume creation will be
 skipped. This is useful for formats such as @code{rbd} if the target
 volume has already been created with site specific options that cannot
@@ -490,6 +496,9 @@ The backing file is changed to @var{backing_file} and (if 
the image format of
 string), then the image is rebased onto no backing file (i.e. it will exist
 independently of any backing file).
 
+If a relative path name is given, the backing file is looked up relative to
+the directory containing @var{filename}.
+
 @var{cache} specifies the cache mode to be used for @var{filename}, whereas
 @var{src_cache} specifies the cache mode for reading backing files.
 
-- 
2.13.3




Re: [Qemu-block] [PATCH] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Fam Zheng
On Fri, 08/04 09:23, Eric Blake wrote:
> > +If a relative path name is given, the backing file is looked up against
> > +@var{filename}, rather than the current working directory.
> 
> Maybe better as:
> the backing file is looked up relative to the directory containing
> @var{filename}

Sounds good, will update.



Re: [Qemu-block] [PATCH v3] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Eric Blake
On 08/04/2017 09:08 AM, Daniel P. Berrange wrote:
> Signed-off-by: Daniel P. Berrange 
> ---
> 
>  - Clarify that @bytes matches @qiov total size (Kevin)
> 
>  include/block/block_int.h | 31 +++
>  1 file changed, 31 insertions(+)

[looks like the nongnu.org infrastructure is having heavy load today, so
mails are getting through more slowly than usual - leads to lots of
crossed emails, where I'm seeing replies or direct sends sooner than
list copies]

> +/**
> + * @offset: position in bytes to write at
> + * @bytes: number of bytes to write
> + * @qiov: the buffers containing data to write
> + * @flags: zero or more of bits allowed by 'supported_write_flags'

maybe s/of //

> + *
> + * @offset and @bytes will be a multiple of 'request_alignment',
> + * but the length of individual @qiov elements does not have to
> + * be a multiple.
> + *
> + * @bytes will always equal the total size of @qiov, and will be
> + * no larger than 'max_transfer'.
> + *
> + * The buffer in @qiov may point directly to guest memory.
> + */
>  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);

Do we make guarantees that the driver callback is never reached if the
image is currently read-only?  If so, is that a guarantee worth documenting?

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 1/7] block: move ThrottleGroup membership to ThrottleGroupMember

2017-08-04 Thread Alberto Garcia
On Mon 31 Jul 2017 11:54:37 AM CEST, Manos Pitsidianakis wrote:
> This commit eliminates the 1:1 relationship between BlockBackend and
> throttle group state.  Users will be able to create multiple throttle
> nodes, each with its own throttle group state, in the future.  The
> throttle group state cannot be per-BlockBackend anymore, it must be
> per-throttle node. This is done by gathering ThrottleGroup membership
> details from BlockBackendPublic into ThrottleGroupMember and refactoring
> existing code to use the structure.
>
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Manos Pitsidianakis 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] Is the use of bdrv_getlength() in qcow.c kosher?

2017-08-04 Thread Markus Armbruster
Markus Armbruster  writes:

> bdrv_getlength() can fail.  The uses in qcow.c don't check.  Is that
> safe?

There's another one in vpc_open().



Re: [Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Stefan Hajnoczi
On Fri, Aug 04, 2017 at 11:50:59AM +0100, Daniel P. Berrange wrote:
> Signed-off-by: Daniel P. Berrange 
> ---
>  include/block/block_int.h | 29 +
>  1 file changed, 29 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[Qemu-block] Is the use of bdrv_getlength() in vhdx*.c kosher?

2017-08-04 Thread Markus Armbruster
bdrv_getlength() can fail.  There are several calls in vhdx*.c that
don't seem to check.  Bug or not?



Re: [Qemu-block] [PATCH v2 5/6] hw/block: Use errp directly rather than local_err

2017-08-04 Thread Stefan Hajnoczi
On Fri, Aug 04, 2017 at 06:26:42PM +0800, Mao Zhongyi wrote:
> Pass the error message to errp directly rather than the local
> variable local_err and propagate it to errp via error_propagate().
> 
> Cc: John Snow 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Keith Busch 
> Cc: Stefan Hajnoczi 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Gerd Hoffmann 
> Cc: Markus Armbruster 
> 
> Signed-off-by: Mao Zhongyi 
> ---
>  hw/block/fdc.c| 17 ++---
>  hw/block/nvme.c   |  8 +++-
>  hw/block/virtio-blk.c | 17 +
>  hw/ide/qdev.c | 12 
>  hw/scsi/scsi-disk.c   | 13 -
>  hw/usb/dev-storage.c  |  9 +++--
>  6 files changed, 25 insertions(+), 51 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] block: move trace probes into bdrv_co_preadv|pwritev

2017-08-04 Thread Stefan Hajnoczi
On Fri, Aug 04, 2017 at 11:50:36AM +0100, Daniel P. Berrange wrote:
> There are trace probes in bdrv_co_readv|writev, however, the
> block drivers are being gradually moved over to using the
> bdrv_co_preadv|pwritev functions instead. As a result some
> block drivers miss the current probes. Move the probes
> into bdrv_co_preadv|pwritev instead, so that they are triggered
> by more (all?) I/O code paths.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/io.c | 8 
>  block/trace-events | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 4/6] hw/block: Fix the return type

2017-08-04 Thread Stefan Hajnoczi
On Fri, Aug 04, 2017 at 06:26:41PM +0800, Mao Zhongyi wrote:
> When the function no success value to transmit, it usually make the
> function return void. It has turned out not to be a success, because
> it means that the extra local_err variable and error_propagate() will
> be needed. It leads to cumbersome code, therefore, transmit success/
> failure in the return value is worth.
> 
> So fix the return type of blkconf_apply_backend_options(),
> blkconf_geometry() and virtio_blk_data_plane_create() to avoid it.
> 
> Cc: John Snow 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Stefan Hajnoczi 
> 
> Signed-off-by: Mao Zhongyi 
> ---
>  hw/block/block.c| 15 +--
>  hw/block/dataplane/virtio-blk.c | 12 +++-
>  hw/block/dataplane/virtio-blk.h |  2 +-
>  include/hw/block/block.h|  4 ++--
>  4 files changed, 19 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] Is the use of bdrv_getlength() in parallels.c kosher?

2017-08-04 Thread Denis V. Lunev
On 08/04/2017 03:31 PM, Markus Armbruster wrote:
> Same question for allocate_clusters() in parallels.c, commit 5a41e1f,
> modified in commit ddd2ef2:
>
> if (s->data_end + space > bdrv_getlength(bs->file->bs) >> 
> BDRV_SECTOR_BITS) {
>
> bdrv_getlength() can fail.  Does it do the right thing then?  For what
> it's worth, the comparison of its value is signed.
>
> There's another one in parallels_open():
>
> if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
> bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs),
>   PREALLOC_MODE_OFF, NULL) != 0) {
> s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
> }
the same as in the previous letter. Saniti checks will not harm.
I'll make a patch.

Thank you for the suggestion.

Den



Re: [Qemu-block] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?

2017-08-04 Thread Denis V. Lunev
On 08/04/2017 03:16 PM, Markus Armbruster wrote:
> Denis, you added this in commit d50d822:
>
> #ifdef CONFIG_FALLOCATE
> if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
> int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
> aiocb->aio_nbytes);
> if (ret == 0 || ret != -ENOTSUP) {
> return ret;
> }
> s->has_fallocate = false;
> }
> #endif
>
> bdrv_getlength() can fail.  Does it do the right thing then?  For what
> it's worth, the comparison of its value is signed.
fallocate() with 0 flags can work only beyond end of file
or on top of the hole. Thus the check is made to validate
that we are beyond EOF.

Technically fallocate should fail if that condition will be
violated. But you right, we can add sanity check here.
This would not harm.

Should I send it?

Den




[Qemu-block] [PATCH] qemu-img: Clarify about relative backing file options

2017-08-04 Thread Fam Zheng
It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.

Signed-off-by: Fam Zheng 
---
 qemu-img.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index 72dabd6b3e..cf079f90e2 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -244,6 +244,9 @@ only the differences from @var{backing_file}. No size needs 
to be specified in
 this case. @var{backing_file} will never be modified unless you use the
 @code{commit} monitor command (or qemu-img commit).
 
+If a relative path name is given, the backing file is looked up against
+@var{filename}, rather than the current working directory.
+
 Note that a given backing file will be opened to check that it is valid. Use
 the @code{-u} option to enable unsafe backing file mode, which means that the
 image will be created even if the associated backing file cannot be opened. A
@@ -343,6 +346,9 @@ created as a copy on write image of the specified base 
image; the
 @var{backing_file} should have the same content as the input's base image,
 however the path, image format, etc may differ.
 
+If a relative path name is given, the backing file is looked up against
+@var{output_filename}, rather than the current working directory.
+
 If the @code{-n} option is specified, the target volume creation will be
 skipped. This is useful for formats such as @code{rbd} if the target
 volume has already been created with site specific options that cannot
@@ -490,6 +496,9 @@ The backing file is changed to @var{backing_file} and (if 
the image format of
 string), then the image is rebased onto no backing file (i.e. it will exist
 independently of any backing file).
 
+If a relative path name is given, the backing file is looked up against
+@var{filename}, rather than the current working directory.
+
 @var{cache} specifies the cache mode to be used for @var{filename}, whereas
 @var{src_cache} specifies the cache mode for reading backing files.
 
-- 
2.13.3




Re: [Qemu-block] Is the use of bdrv_getlength() in vmdk_check() kosher?

2017-08-04 Thread Fam Zheng
On Fri, 08/04 15:28, Markus Armbruster wrote:
> Have a look at vmdk_check():
> 
> if (ret == VMDK_OK &&
> cluster_offset >= bdrv_getlength(extent->file->bs))
> 
> bdrv_getlength() can fail.  Does it do the right thing then?  For what
> it's worth, the comparison of its value is unsigned.

Sure, error handling should be done with bdrv_getlength() return value. I'll
send a patch.

Fam



[Qemu-block] Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?

2017-08-04 Thread Markus Armbruster
Denis, you added this in commit d50d822:

#ifdef CONFIG_FALLOCATE
if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
if (ret == 0 || ret != -ENOTSUP) {
return ret;
}
s->has_fallocate = false;
}
#endif

bdrv_getlength() can fail.  Does it do the right thing then?  For what
it's worth, the comparison of its value is signed.



[Qemu-block] Is the use of bdrv_getlength() in parallels.c kosher? (was: Is the use of bdrv_getlength() in handle_aiocb_write_zeroes() kosher?)

2017-08-04 Thread Markus Armbruster
Same question for allocate_clusters() in parallels.c, commit 5a41e1f,
modified in commit ddd2ef2:

if (s->data_end + space > bdrv_getlength(bs->file->bs) >> BDRV_SECTOR_BITS) 
{

bdrv_getlength() can fail.  Does it do the right thing then?  For what
it's worth, the comparison of its value is signed.

There's another one in parallels_open():

if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs),
  PREALLOC_MODE_OFF, NULL) != 0) {
s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
}



[Qemu-block] Is blk_getlength() in find_image_format() and img_map() kosher?

2017-08-04 Thread Markus Armbruster
Have a look at find_image_format():

if (blk_is_sg(file) || !blk_is_inserted(file) || blk_getlength(file) == 0) {
*pdrv = _raw;
return ret;
}

blk_getlength() can fail.  Shouldn't we error out then?

We pretty obviously should in img_map():

length = blk_getlength(blk);
while (curr.start + curr.length < length) {

Since I have to touch @length in the series I'm working on, I'll stick
in a fix.



Re: [Qemu-block] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Daniel P. Berrange
On Fri, Aug 04, 2017 at 01:48:01PM +0100, Stefan Hajnoczi wrote:
> On Fri, Aug 04, 2017 at 11:51:36AM +0100, Daniel P. Berrange wrote:
> > Using 16KB bounce buffers creates a significant performance
> > penalty for I/O to encrypted volumes on storage with high
> > I/O latency (rotating rust & network drives), because it
> > triggers lots of fairly small I/O operations.
> > 
> > On tests with rotating rust, and cache=none|directsync,
> > write speed increased from 2MiB/s to 32MiB/s, on a par
> > with that achieved by the in-kernel luks driver.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  block/crypto.c | 12 +---
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 58ef6f2f52..207941db9a 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> > @@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
> >  }
> >  
> >  
> > -#define BLOCK_CRYPTO_MAX_SECTORS 32
> > +#define BLOCK_CRYPTO_MAX_SECTORS 2048
> >  
> >  static coroutine_fn int
> >  block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
> > @@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
> > sector_num,
> >  
> >  qemu_iovec_init(_qiov, qiov->niov);
> >  
> > -/* Bounce buffer so we have a linear mem region for
> > - * entire sector. XXX optimize so we avoid bounce
> > - * buffer in case that qiov->niov == 1
> > +/* Bounce buffer because we're not permitted to touch
> > + * contents of qiov - it points to guest memory.
> >   */
> >  cipher_data =
> >  qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 
> > 512,
> 
> In the *read* case you can modify the data buffers in-place.  But the
> guest might see intermediate states in its buffers - not sure whether
> this could pose a security problem.

Whether its a risk or not depends on the choice of crypto parameters, as
exposing ciphertext to the guest might make watermarking attacks easier
to perform. Probably not a problem in practice, but I prefer to err on
the side of caution since I can't be sure it is safe.

> Reviewed-by: Stefan Hajnoczi 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] [PATCH for 2.10] block: use 1 MB bounce buffers for crypto instead of 16KB

2017-08-04 Thread Daniel P. Berrange
Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage with high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver.

Signed-off-by: Daniel P. Berrange 
---
 block/crypto.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 58ef6f2f52..207941db9a 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -379,7 +379,7 @@ static void block_crypto_close(BlockDriverState *bs)
 }
 
 
-#define BLOCK_CRYPTO_MAX_SECTORS 32
+#define BLOCK_CRYPTO_MAX_SECTORS 2048
 
 static coroutine_fn int
 block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -396,9 +396,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t 
sector_num,
 
 qemu_iovec_init(_qiov, qiov->niov);
 
-/* Bounce buffer so we have a linear mem region for
- * entire sector. XXX optimize so we avoid bounce
- * buffer in case that qiov->niov == 1
+/* Bounce buffer because we're not permitted to touch
+ * contents of qiov - it points to guest memory.
  */
 cipher_data =
 qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
@@ -464,9 +463,8 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t 
sector_num,
 
 qemu_iovec_init(_qiov, qiov->niov);
 
-/* Bounce buffer so we have a linear mem region for
- * entire sector. XXX optimize so we avoid bounce
- * buffer in case that qiov->niov == 1
+/* Bounce buffer because we're not permitted to touch
+ * contents of qiov - it points to guest memory.
  */
 cipher_data =
 qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
-- 
2.13.3




[Qemu-block] [PATCH] block: document semanatics of bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
Signed-off-by: Daniel P. Berrange 
---
 include/block/block_int.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d4f4ea7584..deb81a58bd 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -147,12 +147,41 @@ struct BlockDriver {
 
 int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+
+/**
+ * @offset: position in bytes to read at
+ * @bytes: number of bytes to read
+ * @qiov: the buffers to fill with read data
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes may be less than the total sizeof @iov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
+/**
+ * @offset: position in bytes to write at
+ * @bytes: number of bytes to write
+ * @qiov: the buffers containing data to write
+ *
+ * @offset and @bytes will be a multiple of 'request_alignment',
+ * but the length of individual @qiov elements does not have to
+ * be a multiple.
+ *
+ * @bytes may be less than the total sizeof @iov, and will be
+ * no larger than 'max_transfer'.
+ *
+ * The buffer in @qiov may point directly to guest memory.
+ */
 int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 
-- 
2.13.3




[Qemu-block] [PATCH] block: move trace probes into bdrv_co_preadv|pwritev

2017-08-04 Thread Daniel P. Berrange
There are trace probes in bdrv_co_readv|writev, however, the
block drivers are being gradually moved over to using the
bdrv_co_preadv|pwritev functions instead. As a result some
block drivers miss the current probes. Move the probes
into bdrv_co_preadv|pwritev instead, so that they are triggered
by more (all?) I/O code paths.

Signed-off-by: Daniel P. Berrange 
---
 block/io.c | 8 
 block/trace-events | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index d9dc822173..26003814eb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1135,6 +1135,8 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 bool use_local_qiov = false;
 int ret;
 
+trace_bdrv_co_preadv(child->bs, offset, bytes, flags);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -1207,8 +1209,6 @@ static int coroutine_fn bdrv_co_do_readv(BdrvChild *child,
 int coroutine_fn bdrv_co_readv(BdrvChild *child, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov)
 {
-trace_bdrv_co_readv(child->bs, sector_num, nb_sectors);
-
 return bdrv_co_do_readv(child, sector_num, nb_sectors, qiov, 0);
 }
 
@@ -1526,6 +1526,8 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 bool use_local_qiov = false;
 int ret;
 
+trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
+
 if (!bs->drv) {
 return -ENOMEDIUM;
 }
@@ -1660,8 +1662,6 @@ static int coroutine_fn bdrv_co_do_writev(BdrvChild 
*child,
 int coroutine_fn bdrv_co_writev(BdrvChild *child, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-trace_bdrv_co_writev(child->bs, sector_num, nb_sectors);
-
 return bdrv_co_do_writev(child, sector_num, nb_sectors, qiov, 0);
 }
 
diff --git a/block/trace-events b/block/trace-events
index 071a8d77ba..25dd5a3026 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -9,8 +9,8 @@ blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int 
bytes, int flags
 blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
 
 # block/io.c
-bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
-bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
+bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) 
"bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
+bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) 
"bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
 bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p 
offset %"PRId64" count %d flags 0x%x"
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t 
cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u 
cluster_offset %"PRId64" cluster_bytes %u"
 
-- 
2.13.3




[Qemu-block] [PATCH v2 5/6] hw/block: Use errp directly rather than local_err

2017-08-04 Thread Mao Zhongyi
Pass the error message to errp directly rather than the local
variable local_err and propagate it to errp via error_propagate().

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Keith Busch 
Cc: Stefan Hajnoczi 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Gerd Hoffmann 
Cc: Markus Armbruster 

Signed-off-by: Mao Zhongyi 
---
 hw/block/fdc.c| 17 ++---
 hw/block/nvme.c   |  8 +++-
 hw/block/virtio-blk.c | 17 +
 hw/ide/qdev.c | 12 
 hw/scsi/scsi-disk.c   | 13 -
 hw/usb/dev-storage.c  |  9 +++--
 6 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 02f4a46..192a4aa 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -473,16 +473,13 @@ static void fd_revalidate(FDrive *drv)
 static void fd_change_cb(void *opaque, bool load, Error **errp)
 {
 FDrive *drive = opaque;
-Error *local_err = NULL;
 
 if (!load) {
 blk_set_perm(drive->blk, 0, BLK_PERM_ALL, _abort);
 } else {
-blkconf_apply_backend_options(drive->conf,
-  blk_is_read_only(drive->blk), false,
-  _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!blkconf_apply_backend_options(drive->conf,
+   blk_is_read_only(drive->blk),
+   false, errp)) {
 return;
 }
 }
@@ -522,7 +519,6 @@ static void floppy_drive_realize(DeviceState *qdev, Error 
**errp)
 FloppyDrive *dev = FLOPPY_DRIVE(qdev);
 FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus);
 FDrive *drive;
-Error *local_err = NULL;
 int ret;
 
 if (dev->unit == -1) {
@@ -568,10 +564,9 @@ static void floppy_drive_realize(DeviceState *qdev, Error 
**errp)
 dev->conf.rerror = BLOCKDEV_ON_ERROR_AUTO;
 dev->conf.werror = BLOCKDEV_ON_ERROR_AUTO;
 
-blkconf_apply_backend_options(>conf, blk_is_read_only(dev->conf.blk),
-  false, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!blkconf_apply_backend_options(>conf,
+   blk_is_read_only(dev->conf.blk),
+   false, errp)) {
 return;
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 43c60ab..1856053 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -928,7 +928,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 int i;
 int64_t bs_size;
 uint8_t *pci_conf;
-Error *local_err = NULL;
 
 if (!n->conf.blk) {
 error_setg(errp, "drive property not set");
@@ -947,10 +946,9 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 return;
 }
 blkconf_blocksizes(>conf);
-blkconf_apply_backend_options(>conf, blk_is_read_only(n->conf.blk),
-  false, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!blkconf_apply_backend_options(>conf,
+   blk_is_read_only(n->conf.blk),
+   false, errp)) {
 return;
 }
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b750bd8..2940337 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -911,7 +911,6 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOBlock *s = VIRTIO_BLK(dev);
 VirtIOBlkConf *conf = >conf;
-Error *err = NULL;
 unsigned i;
 
 if (!conf->conf.blk) {
@@ -928,17 +927,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 blkconf_serial(>conf, >serial);
-blkconf_apply_backend_options(>conf,
-  blk_is_read_only(conf->conf.blk), true,
-  );
-if (err) {
-error_propagate(errp, err);
+if (!blkconf_apply_backend_options(>conf,
+   blk_is_read_only(conf->conf.blk),
+   true, errp)) {
 return;
 }
 s->original_wce = blk_enable_write_cache(conf->conf.blk);
-blkconf_geometry(>conf, NULL, 65535, 255, 255, );
-if (err) {
-error_propagate(errp, err);
+if (!blkconf_geometry(>conf, NULL, 65535, 255, 255, errp)) {
 return;
 }
 blkconf_blocksizes(>conf);
@@ -953,9 +948,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 for (i = 0; i < conf->num_queues; i++) {
 virtio_add_queue(vdev, 128, virtio_blk_handle_output);
 }
-virtio_blk_data_plane_create(vdev, conf, >dataplane, );
-   

[Qemu-block] [PATCH v2 0/6] Convert to realize and improve error handling

2017-08-04 Thread Mao Zhongyi
This series mainly implements the conversions of ide, floppy and nvme
device to realize. Add some error handling messages and remove the local
variable local_err, use errp to propagate the error directly. Also
fix the unusual function name.

v2:
  -use bool as the return type instead of int. [Markus Armbruster & Stefan 
Hajnoczi]

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Keith Busch 
Cc: Stefan Hajnoczi 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Gerd Hoffmann 
Cc: Markus Armbruster 

Mao Zhongyi (6):
  hw/ide: Convert DeviceClass init to realize
  hw/block/fdc: Convert to realize
  hw/block/nvme: Convert to realize
  hw/block: Fix the return type
  hw/block: Use errp directly rather than local_err
  dev-storage: Fix the unusual function name

 hw/block/block.c| 15 ---
 hw/block/dataplane/virtio-blk.c | 12 +++---
 hw/block/dataplane/virtio-blk.h |  2 +-
 hw/block/fdc.c  | 48 +
 hw/block/nvme.c | 24 +--
 hw/block/virtio-blk.c   | 17 +++-
 hw/ide/core.c   |  7 +--
 hw/ide/qdev.c   | 94 +++--
 hw/scsi/scsi-disk.c | 13 ++
 hw/usb/dev-storage.c| 29 ++---
 include/hw/block/block.h|  4 +-
 include/hw/ide/internal.h   |  5 ++-
 12 files changed, 125 insertions(+), 145 deletions(-)

-- 
2.9.4






[Qemu-block] [PATCH v2 6/6] dev-storage: Fix the unusual function name

2017-08-04 Thread Mao Zhongyi
The function name of usb_msd_{realize,unrealize}_*,
usb_msd_class_initfn_* are unusual. Rename it to
usb_msd_*_{realize,unrealize}, usb_msd_class_*_initfn.

Cc: Gerd Hoffmann 

Signed-off-by: Mao Zhongyi 
---
 hw/usb/dev-storage.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 801f552..2a05cd5 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -596,7 +596,7 @@ static void usb_msd_unrealize_storage(USBDevice *dev, Error 
**errp)
 object_unref(OBJECT(>bus));
 }
 
-static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
+static void usb_msd_storage_realize(USBDevice *dev, Error **errp)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 BlockBackend *blk = s->conf.blk;
@@ -643,14 +643,14 @@ static void usb_msd_realize_storage(USBDevice *dev, Error 
**errp)
 s->scsi_dev = scsi_dev;
 }
 
-static void usb_msd_unrealize_bot(USBDevice *dev, Error **errp)
+static void usb_msd_bot_unrealize(USBDevice *dev, Error **errp)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 
 object_unref(OBJECT(>bus));
 }
 
-static void usb_msd_realize_bot(USBDevice *dev, Error **errp)
+static void usb_msd_bot_realize(USBDevice *dev, Error **errp)
 {
 MSDState *s = USB_STORAGE_DEV(dev);
 DeviceState *d = DEVICE(dev);
@@ -764,12 +764,12 @@ static void usb_msd_class_initfn_common(ObjectClass 
*klass, void *data)
 dc->vmsd = _usb_msd;
 }
 
-static void usb_msd_class_initfn_storage(ObjectClass *klass, void *data)
+static void usb_msd_class_storage_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 USBDeviceClass *uc = USB_DEVICE_CLASS(klass);
 
-uc->realize = usb_msd_realize_storage;
+uc->realize = usb_msd_storage_realize;
 uc->unrealize = usb_msd_unrealize_storage;
 dc->props = msd_properties;
 }
@@ -828,26 +828,26 @@ static void usb_msd_instance_init(Object *obj)
 object_property_set_int(obj, -1, "bootindex", NULL);
 }
 
-static void usb_msd_class_initfn_bot(ObjectClass *klass, void *data)
+static void usb_msd_class_bot_initfn(ObjectClass *klass, void *data)
 {
 USBDeviceClass *uc = USB_DEVICE_CLASS(klass);
 
-uc->realize = usb_msd_realize_bot;
-uc->unrealize = usb_msd_unrealize_bot;
+uc->realize = usb_msd_bot_realize;
+uc->unrealize = usb_msd_bot_unrealize;
 uc->attached_settable = true;
 }
 
 static const TypeInfo msd_info = {
 .name  = "usb-storage",
 .parent= TYPE_USB_STORAGE,
-.class_init= usb_msd_class_initfn_storage,
+.class_init= usb_msd_class_storage_initfn,
 .instance_init = usb_msd_instance_init,
 };
 
 static const TypeInfo bot_info = {
 .name  = "usb-bot",
 .parent= TYPE_USB_STORAGE,
-.class_init= usb_msd_class_initfn_bot,
+.class_init= usb_msd_class_bot_initfn,
 };
 
 static void usb_msd_register_types(void)
-- 
2.9.4






[Qemu-block] [PATCH v2 4/6] hw/block: Fix the return type

2017-08-04 Thread Mao Zhongyi
When the function no success value to transmit, it usually make the
function return void. It has turned out not to be a success, because
it means that the extra local_err variable and error_propagate() will
be needed. It leads to cumbersome code, therefore, transmit success/
failure in the return value is worth.

So fix the return type of blkconf_apply_backend_options(),
blkconf_geometry() and virtio_blk_data_plane_create() to avoid it.

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Stefan Hajnoczi 

Signed-off-by: Mao Zhongyi 
---
 hw/block/block.c| 15 +--
 hw/block/dataplane/virtio-blk.c | 12 +++-
 hw/block/dataplane/virtio-blk.h |  2 +-
 include/hw/block/block.h|  4 ++--
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 27878d0..b0269c8 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -51,7 +51,7 @@ void blkconf_blocksizes(BlockConf *conf)
 }
 }
 
-void blkconf_apply_backend_options(BlockConf *conf, bool readonly,
+bool blkconf_apply_backend_options(BlockConf *conf, bool readonly,
bool resizable, Error **errp)
 {
 BlockBackend *blk = conf->blk;
@@ -76,7 +76,7 @@ void blkconf_apply_backend_options(BlockConf *conf, bool 
readonly,
 
 ret = blk_set_perm(blk, perm, shared_perm, errp);
 if (ret < 0) {
-return;
+return false;
 }
 
 switch (conf->wce) {
@@ -99,9 +99,11 @@ void blkconf_apply_backend_options(BlockConf *conf, bool 
readonly,
 
 blk_set_enable_write_cache(blk, wce);
 blk_set_on_error(blk, rerror, werror);
+
+return true;
 }
 
-void blkconf_geometry(BlockConf *conf, int *ptrans,
+bool blkconf_geometry(BlockConf *conf, int *ptrans,
   unsigned cyls_max, unsigned heads_max, unsigned secs_max,
   Error **errp)
 {
@@ -129,15 +131,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
 if (conf->cyls || conf->heads || conf->secs) {
 if (conf->cyls < 1 || conf->cyls > cyls_max) {
 error_setg(errp, "cyls must be between 1 and %u", cyls_max);
-return;
+return false;
 }
 if (conf->heads < 1 || conf->heads > heads_max) {
 error_setg(errp, "heads must be between 1 and %u", heads_max);
-return;
+return false;
 }
 if (conf->secs < 1 || conf->secs > secs_max) {
 error_setg(errp, "secs must be between 1 and %u", secs_max);
-return;
+return false;
 }
 }
+return true;
 }
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5556f0e..f6fc639 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -76,7 +76,7 @@ static void notify_guest_bh(void *opaque)
 }
 
 /* Context: QEMU global mutex held */
-void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
+bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
   VirtIOBlockDataPlane **dataplane,
   Error **errp)
 {
@@ -91,11 +91,11 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 error_setg(errp,
"device is incompatible with iothread "
"(transport does not support notifiers)");
-return;
+return false;
 }
 if (!virtio_device_ioeventfd_enabled(vdev)) {
 error_setg(errp, "ioeventfd is required for iothread");
-return;
+return false;
 }
 
 /* If dataplane is (re-)enabled while the guest is running there could
@@ -103,12 +103,12 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
  */
 if (blk_op_is_blocked(conf->conf.blk, BLOCK_OP_TYPE_DATAPLANE, errp)) {
 error_prepend(errp, "cannot start virtio-blk dataplane: ");
-return;
+return false;
 }
 }
 /* Don't try if transport does not support notifiers. */
 if (!virtio_device_ioeventfd_enabled(vdev)) {
-return;
+return false;
 }
 
 s = g_new0(VirtIOBlockDataPlane, 1);
@@ -126,6 +126,8 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
+
+return true;
 }
 
 /* Context: QEMU global mutex held */
diff --git a/hw/block/dataplane/virtio-blk.h b/hw/block/dataplane/virtio-blk.h
index db3f47b..5e18bb9 100644
--- a/hw/block/dataplane/virtio-blk.h
+++ b/hw/block/dataplane/virtio-blk.h
@@ -19,7 +19,7 @@
 
 typedef struct VirtIOBlockDataPlane VirtIOBlockDataPlane;
 
-void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
+bool 

[Qemu-block] [PATCH v2 2/6] hw/block/fdc: Convert to realize

2017-08-04 Thread Mao Zhongyi
Convert floppy_drive_init() to realize and rename it to
floppy_drive_realize().

Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Markus Armbruster 

Signed-off-by: Mao Zhongyi 
---
 hw/block/fdc.c | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 4011290..02f4a46 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -517,7 +517,7 @@ static Property floppy_drive_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static int floppy_drive_init(DeviceState *qdev)
+static void floppy_drive_realize(DeviceState *qdev, Error **errp)
 {
 FloppyDrive *dev = FLOPPY_DRIVE(qdev);
 FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus);
@@ -535,15 +535,15 @@ static int floppy_drive_init(DeviceState *qdev)
 }
 
 if (dev->unit >= MAX_FD) {
-error_report("Can't create floppy unit %d, bus supports only %d units",
- dev->unit, MAX_FD);
-return -1;
+error_setg(errp, "Can't create floppy unit %d, bus supports "
+   "only %d units", dev->unit, MAX_FD);
+return;
 }
 
 drive = get_drv(bus->fdc, dev->unit);
 if (drive->blk) {
-error_report("Floppy unit %d is in use", dev->unit);
-return -1;
+error_setg(errp, "Floppy unit %d is in use", dev->unit);
+return;
 }
 
 if (!dev->conf.blk) {
@@ -557,8 +557,9 @@ static int floppy_drive_init(DeviceState *qdev)
 if (dev->conf.logical_block_size != 512 ||
 dev->conf.physical_block_size != 512)
 {
-error_report("Physical and logical block size must be 512 for floppy");
-return -1;
+error_setg(errp, "Physical and logical block size must "
+   "be 512 for floppy");
+return;
 }
 
 /* rerror/werror aren't supported by fdc and therefore not even registered
@@ -570,20 +571,20 @@ static int floppy_drive_init(DeviceState *qdev)
 blkconf_apply_backend_options(>conf, blk_is_read_only(dev->conf.blk),
   false, _err);
 if (local_err) {
-error_report_err(local_err);
-return -1;
+error_propagate(errp, local_err);
+return;
 }
 
 /* 'enospc' is the default for -drive, 'report' is what blk_new() gives us
  * for empty drives. */
 if (blk_get_on_error(dev->conf.blk, 0) != BLOCKDEV_ON_ERROR_ENOSPC &&
 blk_get_on_error(dev->conf.blk, 0) != BLOCKDEV_ON_ERROR_REPORT) {
-error_report("fdc doesn't support drive option werror");
-return -1;
+error_setg(errp, "fdc doesn't support drive option werror");
+return;
 }
 if (blk_get_on_error(dev->conf.blk, 1) != BLOCKDEV_ON_ERROR_REPORT) {
-error_report("fdc doesn't support drive option rerror");
-return -1;
+error_setg(errp, "fdc doesn't support drive option rerror");
+return;
 }
 
 drive->conf = >conf;
@@ -599,14 +600,12 @@ static int floppy_drive_init(DeviceState *qdev)
 dev->type = drive->drive;
 
 fd_revalidate(drive);
-
-return 0;
 }
 
 static void floppy_drive_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
-k->init = floppy_drive_init;
+k->realize = floppy_drive_realize;
 set_bit(DEVICE_CATEGORY_STORAGE, k->categories);
 k->bus_type = TYPE_FLOPPY_BUS;
 k->props = floppy_drive_properties;
-- 
2.9.4






Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10 2/5] block: Allow reopen rw without BDRV_O_ALLOW_RDWR

2017-08-04 Thread Kevin Wolf
Am 04.08.2017 um 03:49 hat Eric Blake geschrieben:
> On 08/03/2017 10:21 AM, Eric Blake wrote:
> > On 08/03/2017 10:02 AM, Kevin Wolf wrote:
> >> BDRV_O_ALLOW_RDWR is a flag that tells whether qemu can internally
> >> reopen a node read-write temporarily because the user requested
> >> read-write for the top-level image, but qemu decided that read-only is
> >> enough for this node (a backing file).
> >>
> >> bdrv_reopen() is different, it is also used for cases where the user
> >> changed their mind and wants to update the options. There is no reason
> >> to forbid making a node read-write in that case.
> > 
> > Hmm, I wonder.  https://bugzilla.redhat.com/show_bug.cgi?id=1465320
> > details a failure when starting qemu with a read-write NBD disk, then
> > taking several snapshots (nbd <- snap1 <- snap2 <- snap3), then where
> > intermediate commit (snap2 into nbd) works but live commit (snap3 into
> > nbd) fails with a message that nbd does not support reopening.  I'm
> > presuming that your series may help to address that; I'll give it a spin
> > and see what happens.
> 
> Nope, even with your patches, I'm still getting:
> 
> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar2'}}
> {"return": {}}
> {"timestamp": {"seconds": 1501811285, "microseconds": 439748}, "event":
> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-image1", "len":
> 2097152, "offset": 2097152, "speed": 0, "type": "commit"}}
> 
> {'execute':'block-commit','arguments':{'device':'drive-image1','top':'bar3'}}
> {"error": {"class": "GenericError", "desc": "Block format 'nbd' used by
> node '#block048' does not support reopening files"}}

That's simply NBD not implementing .bdrv_reopen_*. In other words, not a
bug, but just a missing feature.

Kevin


signature.asc
Description: PGP signature