Re: [Qemu-block] [PATCH v6 2/2] qemu-img: Document --force-share / -U

2018-02-01 Thread Stefan Hajnoczi
On Wed, Jan 31, 2018 at 03:22:49PM +0100, Kevin Wolf wrote:
> Am 31.01.2018 um 15:12 hat Stefan Hajnoczi geschrieben:
> > There should be a separate paragraph in docs/qemu-block-drivers.texi
> > explaining that share-rw=on can be used safely with format=raw if the
> > guests are configured to safely access a shared disk.  It should also
> > mention that share-rw=on is unsafe for image formats.
> 
> share-rw=on is a -device option and only about the guest, not about the
> backend. It is never unsafe if the guest can cope with external writers.
> It just doesn't prevent qemu from locking the image file for image
> formats.

Thanks for the explanation.  Documentation on share-rw=on|off would be
nice.

Maybe something like:

  By default the guest has exclusive write access to its disk image.
  This is enforced by QEMU via file locking.  If the guest can safely
  share the disk image with other writers the -device ...,share-rw=on
  parameter can be used.  This is only safe if the guest is running
  software, such as a cluster file system, that coordinates disk
  accesses to avoid corruption.

  Note that share-rw=on only declares the guest's ability to share the
  disk.  Some QEMU features, such as image file formats, require
  exclusive write access to the disk image and this is unaffected by the
  share-rw=on option.

For anyone else reading this, here is the code that protects qcow2 and
other image formats:

  void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
 const BdrvChildRole *role,
 BlockReopenQueue *reopen_queue,
 uint64_t perm, uint64_t shared,
 uint64_t *nperm, uint64_t *nshared)
  {
  ...

  /* bs->file always needs to be consistent because of the metadata. We
   * can never allow other users to resize or write to it. */
  if (!(flags & BDRV_O_NO_IO)) {
  perm |= BLK_PERM_CONSISTENT_READ;
  }
  shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] block: fix write with zero flag set and iovector provided

2018-02-01 Thread Anton Nefedov



On 1/2/2018 5:29 PM, Alberto Garcia wrote:

On Thu 01 Feb 2018 03:16:31 PM CET, Anton Nefedov wrote:

The normal bdrv_co_pwritev() use is either
   - BDRV_REQ_ZERO_WRITE reset and iovector provided
   - BDRV_REQ_ZERO_WRITE set and iovector == NULL

while
   - the flag reset and iovector == NULL is an assertion failure
 in bdrv_co_do_zero_pwritev()


Where is that assertion?

Berto



beginning of bdrv_co_do_zero_pwritev():

assert(flags & BDRV_REQ_ZERO_WRITE);

and bdrv_co_do_zero_pwritev() was only called with qiov==NULL.

Now this case will instead segfault at some point.
Don't know if it needs a separate assertion.

/Anton



Re: [Qemu-block] [PATCH v3 18/39] qcow2: Refactor get_cluster_table()

2018-02-01 Thread Alberto Garcia
On Wed 31 Jan 2018 09:11:48 PM CET, Max Reitz wrote:
> On 2018-01-26 15:59, Alberto Garcia wrote:
>> After the previous patch we're now always using l2_load() in
>> get_cluster_table() regardless of whether a new L2 table has to be
>> allocated or not.
>> 
>> This patch refactors that part of the code to use one single l2_load()
>> call.
>> 
>> Signed-off-by: Alberto Garcia 
>> ---
>>  block/qcow2-cluster.c | 21 +++--
>>  1 file changed, 7 insertions(+), 14 deletions(-)
>> 
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 2a53c1dc5f..0c0cab85e8 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -695,15 +695,7 @@ static int get_cluster_table(BlockDriverState *bs, 
>> uint64_t offset,
>>  return -EIO;
>>  }
>>  
>> -/* seek the l2 table of the given l2 offset */
>> -
>> -if (s->l1_table[l1_index] & QCOW_OFLAG_COPIED) {
>> -/* load the l2 table in memory */
>> -ret = l2_load(bs, offset, l2_offset, _table);
>> -if (ret < 0) {
>> -return ret;
>> -}
>> -} else {
>> +if (!(s->l1_table[l1_index] & QCOW_OFLAG_COPIED)) {
>>  /* First allocate a new L2 table (and do COW if needed) */
>>  ret = l2_allocate(bs, l1_index);
>>  if (ret < 0) {
>> @@ -719,11 +711,12 @@ static int get_cluster_table(BlockDriverState *bs, 
>> uint64_t offset,
>>  /* Get the offset of the newly-allocated l2 table */
>>  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
>>  assert(offset_into_cluster(s, l2_offset) == 0);
>> -/* Load the l2 table in memory */
>> -ret = l2_load(bs, offset, l2_offset, _table);
>> -if (ret < 0) {
>> -return ret;
>> -}
>> +}
>> +
>> +/* load the l2 table in memory */
>> +ret = l2_load(bs, offset, l2_offset, _table);
>> +if (ret < 0) {
>> +return ret;
>>  }
>
> I'd pull the l2_offset assignment (and alignment check) down below the
> QCOW_OFLAG_COPIED check, saving us another two lines (!!1!) which we
> could then spend on an
>
> assert(s->l1_table[l1_index] & QCOW_OFLAG_COPIED);

I'm not sure if I'm following you. We need to set l2_offset before and
after allocating a new L2 table.

Before, because we need the offset of the old table (if there was one)
in order to decrease its refcount.

And after, because we need the offset of the new table in order to load
it.

Berto



Re: [Qemu-block] [PATCH v2 5/6] block: Handle null backing link

2018-02-01 Thread Alberto Garcia
On Sat 20 Jan 2018 04:44:11 PM CET, Max Reitz wrote:
> Instead of converting all "backing": null instances into "backing": "",
> handle a null value directly in bdrv_open_inherit().
>
> This enables explicitly null backing links for json:{} filenames.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH] block: fix write with zero flag set and iovector provided

2018-02-01 Thread Eric Blake
On 02/01/2018 08:16 AM, Anton Nefedov wrote:
> The normal bdrv_co_pwritev() use is either
>   - BDRV_REQ_ZERO_WRITE reset and iovector provided

s/reset/clear/

>   - BDRV_REQ_ZERO_WRITE set and iovector == NULL
> 
> while
>   - the flag reset and iovector == NULL is an assertion failure

again

> in bdrv_co_do_zero_pwritev()
>   - the flag set and iovector provided is in fact allowed
> (the flag prevails and zeroes are written)
> 
> However the alignment logic does not support the latter case so the padding
> areas get overwritten with zeroes.
> 
> Solution could be to forbid such case or just use bdrv_co_do_zero_pwritev()
> alignment for it which also makes the code a bit more obvious anyway.
> 
> Signed-off-by: Anton Nefedov 
> ---
>  block/io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 7ea4023..cf63fd0 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>   */
>  tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
>  
> -if (!qiov) {
> +if (flags & BDRV_REQ_ZERO_WRITE) {
>  ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, );

So now, the flag rules, but we assert that !qiov (so it would only break
a caller that passed the flag but used qiov, which you argued shouldn't
exist).

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] block: fix write with zero flag set and iovector provided

2018-02-01 Thread Alberto Garcia
On Thu 01 Feb 2018 03:16:31 PM CET, Anton Nefedov wrote:
> The normal bdrv_co_pwritev() use is either
>   - BDRV_REQ_ZERO_WRITE reset and iovector provided
>   - BDRV_REQ_ZERO_WRITE set and iovector == NULL
>
> while
>   - the flag reset and iovector == NULL is an assertion failure
> in bdrv_co_do_zero_pwritev()

Where is that assertion?

Berto



Re: [Qemu-block] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising

2018-02-01 Thread Alberto Garcia
On Wed 31 Jan 2018 06:11:27 PM CET, Anton Nefedov wrote:
> On 31/1/2018 6:11 PM, Alberto Garcia wrote:
>> On Thu 18 Jan 2018 06:49:02 PM CET, Anton Nefedov wrote:
>> 
>>> -static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest 
>>> *self)
>>> +static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest 
>>> *self,
>>> +   bool nowait)
>> 
>> It's a bit confusing to have a function called wait_foo() with a
>> parameter that says "don't wait"...
>> 
>> How about
>> 
>>   check_serialising_requests(BdrvTrackedRequest *self, bool wait)
>> 
>
> I think it might be more important to emphasize in the name that the
> function _might_ wait.
>
> i.e. it feels worse to read
>check_serialising_requests(req, true);
> when one needs to follow the function to find out that it might yield.
>
> Personally I'd vote for
>
>  static int check_or_wait_serialising_requests(
>  BdrvTrackedRequest *self, bool wait) {}
>
> and maybe even:
>
>  static int check_serialising_requests(BdrvTrackedRequest *self) {
>  return check_or_wait_serialising_requests(self, false);
>
>  static int wait_serialising_requests(BdrvTrackedRequest *self) {
>  return check_or_wait_serialising_requests(self, true);
>  }

You're right. Either approach works for me though, also keeping the
current solution, wait_serialising_requests(req, true).

Berto



[Qemu-block] [PATCH] block: fix write with zero flag set and iovector provided

2018-02-01 Thread Anton Nefedov
The normal bdrv_co_pwritev() use is either
  - BDRV_REQ_ZERO_WRITE reset and iovector provided
  - BDRV_REQ_ZERO_WRITE set and iovector == NULL

while
  - the flag reset and iovector == NULL is an assertion failure
in bdrv_co_do_zero_pwritev()
  - the flag set and iovector provided is in fact allowed
(the flag prevails and zeroes are written)

However the alignment logic does not support the latter case so the padding
areas get overwritten with zeroes.

Solution could be to forbid such case or just use bdrv_co_do_zero_pwritev()
alignment for it which also makes the code a bit more obvious anyway.

Signed-off-by: Anton Nefedov 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 7ea4023..cf63fd0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
  */
 tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
 
-if (!qiov) {
+if (flags & BDRV_REQ_ZERO_WRITE) {
 ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, );
 goto out;
 }
-- 
2.7.4




Re: [Qemu-block] [PATCH v2 6/6] block: Deprecate "backing": ""

2018-02-01 Thread Alberto Garcia
On Sat 20 Jan 2018 04:44:12 PM CET, Max Reitz wrote:
> We have a clear replacement, so let's deprecate it.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH] vl: pause vcpus before stopping iothreads

2018-02-01 Thread Stefan Hajnoczi
On Wed, Jan 31, 2018 at 03:31:27PM +0100, Kevin Wolf wrote:
> Am 31.01.2018 um 14:56 hat Stefan Hajnoczi geschrieben:
> > On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> > > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > > > before main() quits") introduced iothread_stop_all() to avoid the
> > > > following virtio-scsi assertion failure:
> > > > 
> > > >   assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> > > > 
> > > > Back then the assertion failed because when bdrv_close_all() made
> > > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > > > instead of s->ctx.
> > > > 
> > > > The same assertion can still fail today when vcpus submit new I/O
> > > > requests after iothread_stop_all() has moved the BDS to the global
> > > > AioContext.
> > > > 
> > > > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > > > before calling iothread_stop_all().
> > > > 
> > > > Note that the assertion failure is a race condition.  It is not possible
> > > > to reproduce it reliably.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi 
> > > 
> > > Does pausing the vcpus actually make sure that the iothread isn't active
> > > any more, or do we still have a small window where the vcpu is already
> > > stopped, but the iothread is still processing requests?
> > > 
> > > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> > > does either not have any effect, or if it does have an effect, it's
> > > wrong. You can't just force an in-use BDS into a different AioContext
> > > when the user that set the AioContext is still there.
> > > 
> > > At the very least, do we need a blk_drain_all() before stopping the
> > > iothreads?
> > 
> > bdrv_set_aio_context() contains aio_disable_external() +
> > bdrv_parent_drained_begin() + bdrv_drain(bs).  This should complete all
> > requests, even those sitting in a descriptor ring that hasn't been
> > processed yet.
> 
> Ah, yes. Not very obvious, so I wouldn't mind a comment, but you can
> have my R-b either way then:
> 
> Reviewed-by: Kevin Wolf 

Thanks for the review!  I have sent v2 with a comment for you to review.

Stefan


signature.asc
Description: PGP signature


[Qemu-block] [PATCH v2] vl: pause vcpus before stopping iothreads

2018-02-01 Thread Stefan Hajnoczi
Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
before main() quits") introduced iothread_stop_all() to avoid the
following virtio-scsi assertion failure:

  assert(blk_get_aio_context(d->conf.blk) == s->ctx);

Back then the assertion failed because when bdrv_close_all() made
d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
instead of s->ctx.

The same assertion can still fail today when vcpus submit new I/O
requests after iothread_stop_all() has moved the BDS to the global
AioContext.

This patch hardens the iothread_stop_all() approach by pausing vcpus
before calling iothread_stop_all().

Note that the assertion failure is a race condition.  It is not possible
to reproduce it reliably.

Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Add comment explaining rationale for this ordering and why it's safe.
   [Kevin]

 vl.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index e517a8d995..b72a9fc6df 100644
--- a/vl.c
+++ b/vl.c
@@ -4766,10 +4766,18 @@ int main(int argc, char **argv, char **envp)
 
 main_loop();
 replay_disable_events();
+
+/* The ordering of the following is delicate.  Stop vcpus to prevent new
+ * I/O requests being queued by the guest.  Then stop IOThreads (this
+ * includes a drain operation and completes all request processing).  At
+ * this point emulated devices are still associated with their IOThreads
+ * (if any) but no longer have any work to do.  Only then can we close
+ * block devices safely because we know there is no more I/O coming.
+ */
+pause_all_vcpus();
 iothread_stop_all();
-
-pause_all_vcpus();
 bdrv_close_all();
+
 res_free();
 
 /* vhost-user must be cleaned up before chardevs.  */
-- 
2.14.3




Re: [Qemu-block] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-02-01 Thread Alberto Garcia
On Wed 31 Jan 2018 09:07:48 PM CET, Max Reitz wrote:
> On 2018-01-26 15:59, Alberto Garcia wrote:
>> This patch updates l2_allocate() to support the qcow2 cache returning
>> L2 slices instead of full L2 tables.
>> 
>> The old code simply gets an L2 table from the cache and initializes it
>> with zeroes or with the contents of an existing table. With a cache
>> that returns slices instead of tables the idea remains the same, but
>> the code must now iterate over all the slices that are contained in an
>> L2 table.
>> 
>> Since now we're operating with slices the function can no longer
>> return the newly-allocated table, so it's up to the caller to retrieve
>> the appropriate L2 slice after calling l2_allocate() (note that with
>> this patch the caller is still loading full L2 tables, but we'll deal
>> with that in a separate patch).
>> 
>> Signed-off-by: Alberto Garcia 
>> ---
>>  block/qcow2-cluster.c | 56 
>> +++
>>  1 file changed, 34 insertions(+), 22 deletions(-)
>> 
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 57349928a9..2a53c1dc5f 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
>> l1_index)
>>   *
>>   */
>>  
>> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>>  {
>>  BDRVQcow2State *s = bs->opaque;
>>  uint64_t old_l2_offset;
>> -uint64_t *l2_table = NULL;
>> +uint64_t *l2_slice = NULL;
>> +unsigned slice, slice_size2, n_slices;
>
> I'd personally prefer size_t, but oh well.

I don't see any reason not to use int / unsigned. The size of a slice is
always <= cluster_size (an int), and both slice and n_slices are smaller
than that.

> However, I'm wondering whether this is the best approach.  The old L2
> table is probably not going to be used after this function, so we're
> basically polluting the cache here.  That was bad enough so far, but
> now that actually means wasting multiple cache entries on it.
>
> Sure, the code is simpler this way.  But maybe it would still be
> better to manually copy the data over from the old offset...  (As long
> as it's not much more complicated.)

You mean bypassing the cache altogether?

qcow2_cache_flush(bs, s->l2_table_cache);
new_table = g_malloc(s->cluster_size);
if (old_l2_offset & L1E_OFFSET_MASK) {
bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
} else {
memset(new_table, 0, s->cluster_size);
}
bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
g_free(new_table);

??

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH] block: fix write with zero flag set and iovector provided

2018-02-01 Thread Eric Blake
On 02/01/2018 08:36 AM, Eric Blake wrote:
> On 02/01/2018 08:16 AM, Anton Nefedov wrote:
>> The normal bdrv_co_pwritev() use is either
>>   - BDRV_REQ_ZERO_WRITE reset and iovector provided
> 
> s/reset/clear/
> 
>>   - BDRV_REQ_ZERO_WRITE set and iovector == NULL
>>
>> while
>>   - the flag reset and iovector == NULL is an assertion failure
> 
> again
> 
>> in bdrv_co_do_zero_pwritev()
>>   - the flag set and iovector provided is in fact allowed
>> (the flag prevails and zeroes are written)
>>
>> However the alignment logic does not support the latter case so the padding
>> areas get overwritten with zeroes.
>>
>> Solution could be to forbid such case or just use bdrv_co_do_zero_pwritev()
>> alignment for it which also makes the code a bit more obvious anyway.
>>
>> Signed-off-by: Anton Nefedov 
>> ---
>>  block/io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index 7ea4023..cf63fd0 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>   */
>>  tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
>>  
>> -if (!qiov) {
>> +if (flags & BDRV_REQ_ZERO_WRITE) {
>>  ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, );
> 
> So now, the flag rules, but we assert that !qiov (so it would only break
> a caller that passed the flag but used qiov, which you argued shouldn't
> exist).

Sorry, I hit send too soon.  I'm asking if we should have assert(!qiov)
right before calling bdrv_co_do_zero_pwritev (it would break a caller
that passed the flag and qiov, but you were arguing that such callers
previously misbehaved, so we don't want such callers).

But adding such an assertion may trigger failures that we'd have to fix,
while leaving things without the assertion conservatively seems okay.

> 
> Reviewed-by: Eric Blake 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/7] Call check and invalidate_cache from coroutine context

2018-02-01 Thread Paolo Bonzini
On 18/01/2018 07:43, Paolo Bonzini wrote:
> Check and invalidate_cache share some parts of the implementation
> with the regular I/O path.  This is sometimes complicated because the
> I/O path wants to use a CoMutex but that is not possible outside coroutine
> context.  By moving things to coroutine context, we can remove special
> cases.  In fact, invalidate_cache is already called from coroutine context
> because incoming migration is placed in a coroutine.
> 
> While at it, I'm including two patches from Stefan to rename the
> bdrv_create callback to bdrv_co_create, because it is already called
> from coroutine context.  The name is now bdrv_co_create_opts, with
> bdrv_co_create reserved for the QAPI-based version that Kevin is
> working on.
> 
> qcow2 still has cache flushing in non-coroutine context, coming from
> qcow2_reopen_prepare->qcow2_update_options_prepare and
> qcow2_close->qcow2_inactivate.
> 
> Paolo

Ping?

Thanks,

Paolo

> v1->v2: fix file-win32
> 
> Paolo Bonzini (5):
>   qcow2: make qcow2_do_open a coroutine_fn
>   qed: make bdrv_qed_do_open a coroutine_fn
>   block: convert bdrv_invalidate_cache callback to coroutine_fn
>   qcow2: introduce qcow2_write_caches and qcow2_flush_caches
>   block: convert bdrv_check callback to coroutine_fn
> 
> Stefan Hajnoczi (2):
>   block: rename .bdrv_create() to .bdrv_co_create_opts()
>   qcow2: make qcow2_co_create2() a coroutine_fn
> 
>  block.c   |  88 ++
>  block/crypto.c|   8 ++--
>  block/file-posix.c|  15 +++---
>  block/file-win32.c|   5 +-
>  block/gluster.c   |  12 ++---
>  block/iscsi.c |  13 ++---
>  block/nfs.c   |  11 +++--
>  block/parallels.c |  23 +
>  block/qcow.c  |   5 +-
>  block/qcow2-refcount.c|  28 +++
>  block/qcow2.c | 118 
> +++---
>  block/qcow2.h |   2 +
>  block/qed-check.c |   1 +
>  block/qed-table.c |  26 --
>  block/qed.c   |  72 +---
>  block/raw-format.c|   5 +-
>  block/rbd.c   |  12 +++--
>  block/sheepdog.c  |  10 ++--
>  block/ssh.c   |   5 +-
>  block/vdi.c   |  11 +++--
>  block/vhdx.c  |  12 +++--
>  block/vmdk.c  |  12 +++--
>  block/vpc.c   |   5 +-
>  include/block/block_int.h |  11 +++--
>  24 files changed, 355 insertions(+), 155 deletions(-)
> 




Re: [Qemu-block] [Qemu-devel] [PATCH 4/7] qed: make bdrv_qed_do_open a coroutine_fn

2018-02-01 Thread Paolo Bonzini
On 18/01/2018 15:19, Eric Blake wrote:
> On 01/18/2018 06:43 AM, Paolo Bonzini wrote:
>> It is called from qcow2_invalidate_cache in coroutine context (incoming
> Why is a qcow2 function calling qed code?  Definitely a copy-paste bug,
> but I'm not sure of the right fix.

This reference was of course meant to be qed_invalidate_cache.

Thanks,

Paolo

>> migration runs in a coroutine), so it's cleaner if metadata is always
>> loaded from a coroutine.




Re: [Qemu-block] [PATCH] iotests: Fix CID for VMDK afl image

2018-02-01 Thread Max Reitz
On 2018-02-01 01:55, Fam Zheng wrote:
> On Thu, Feb 1, 2018 at 2:58 AM, Max Reitz  wrote:
>> On 2018-01-30 07:25, Fam Zheng wrote:
>>> This reverts commit 76bf133c4 which updated the reference output, and
>>> fixed the reference image, because the code path we want to exercise is
>>> actually the invalid image size.
>>>
>>> The descriptor block in the image, which includes the CID to verify, has 
>>> been
>>> invalid since the reference image was added. Since commit 9877860e7bd we 
>>> report
>>> this error earlier than the "file too large", so 059.out mismatches.
>>>
>>> The binary change is generated along the operations of:
>>>
>>>   $ bunzip2 afl9.vmdk.bz2
>>>   $ qemu-img create -f vmdk fix.vmdk 1G
>>>   $ dd if=afl9.vmdk.bz2 of=fix.vmdk bs=512 count=1 conv=notrunc
>>>   $ mv fix.vmdk afl9.vmdk
>>>   $ bzip2 afl9.vmdk
>>>
>>> Signed-off-by: Fam Zheng 
>>>
>>> ---
>>>
>>> v2: Fix commit message "qcow2 -> vmdk". [Kevin]
>>> Revert 76bf133c4.
>>
>> H, now this fails again on my 32 bit build. :-(
>>
>> The issue there is that you get a "Cannot allocate memory" when trying
>> to open the file.  My current fix was 2291712c39111a732 which simply
>> converted that to "Invalid argument", but now it's getting a bit more
>> complicated...  Should I just continue to play the game and check the
>> output for "Cannot allocate memory" and print exactly what the reference
>> output is expecting...?
> 
> Ahhh. OK, then, with a big comment.
> 
> I'd say let's just _notrun on 32 bit.

Sounds OK, but how would we test that? uname? Or just _notrun when we
see the ENOMEM message?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 18/39] qcow2: Refactor get_cluster_table()

2018-02-01 Thread Max Reitz
On 2018-02-01 11:40, Alberto Garcia wrote:
> On Wed 31 Jan 2018 09:11:48 PM CET, Max Reitz wrote:
>> On 2018-01-26 15:59, Alberto Garcia wrote:
>>> After the previous patch we're now always using l2_load() in
>>> get_cluster_table() regardless of whether a new L2 table has to be
>>> allocated or not.
>>>
>>> This patch refactors that part of the code to use one single l2_load()
>>> call.
>>>
>>> Signed-off-by: Alberto Garcia 
>>> ---
>>>  block/qcow2-cluster.c | 21 +++--
>>>  1 file changed, 7 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 2a53c1dc5f..0c0cab85e8 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -695,15 +695,7 @@ static int get_cluster_table(BlockDriverState *bs, 
>>> uint64_t offset,
>>>  return -EIO;
>>>  }
>>>  
>>> -/* seek the l2 table of the given l2 offset */
>>> -
>>> -if (s->l1_table[l1_index] & QCOW_OFLAG_COPIED) {
>>> -/* load the l2 table in memory */
>>> -ret = l2_load(bs, offset, l2_offset, _table);
>>> -if (ret < 0) {
>>> -return ret;
>>> -}
>>> -} else {
>>> +if (!(s->l1_table[l1_index] & QCOW_OFLAG_COPIED)) {
>>>  /* First allocate a new L2 table (and do COW if needed) */
>>>  ret = l2_allocate(bs, l1_index);
>>>  if (ret < 0) {
>>> @@ -719,11 +711,12 @@ static int get_cluster_table(BlockDriverState *bs, 
>>> uint64_t offset,
>>>  /* Get the offset of the newly-allocated l2 table */
>>>  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
>>>  assert(offset_into_cluster(s, l2_offset) == 0);
>>> -/* Load the l2 table in memory */
>>> -ret = l2_load(bs, offset, l2_offset, _table);
>>> -if (ret < 0) {
>>> -return ret;
>>> -}
>>> +}
>>> +
>>> +/* load the l2 table in memory */
>>> +ret = l2_load(bs, offset, l2_offset, _table);
>>> +if (ret < 0) {
>>> +return ret;
>>>  }
>>
>> I'd pull the l2_offset assignment (and alignment check) down below the
>> QCOW_OFLAG_COPIED check, saving us another two lines (!!1!) which we
>> could then spend on an
>>
>> assert(s->l1_table[l1_index] & QCOW_OFLAG_COPIED);
> 
> I'm not sure if I'm following you. We need to set l2_offset before and
> after allocating a new L2 table.
> 
> Before, because we need the offset of the old table (if there was one)
> in order to decrease its refcount.
> 
> And after, because we need the offset of the new table in order to load
> it.

Ah, right, yeah... Well, too bad.  Then I probably can't ask for the
assert() that badly.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 22/39] qcow2: Update handle_copied() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> handle_copied() loads an L2 table and limits the number of checked
> clusters to the amount that fits inside that table. Since we'll be
> loading L2 slices instead of full tables we need to update that limit.
> 
> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag

2018-02-01 Thread Max Reitz
On 2018-02-01 14:34, Anton Nefedov wrote:
> On 31/1/2018 8:31 PM, Max Reitz wrote:
>> On 2018-01-30 13:34, Anton Nefedov wrote:
>>> Offtop: does REQ_ZERO_WRITE override REQ_WRITE_COMPRESSED in this
>>> function? at least with !REQ_MAY_UNMAP it looks wrong
>>
>> Looks like zero detection will indeed override compression.  I think
>> that was intended, but I don't even have an opinion either way.
>>
>> Of course, it wouldn't be so nice if you tried to compress something and
>> then, because the zero write failed, you actually write uncompressed
>> zeroes...  But since zero detection is an optional feature, it might be
>> your own fault if you enable it when you want compression anyway, and if
>> you write to some format/protocol combination that doesn't allow zero
>> writes.
>>
>> Max
>>
> 
> Is it detection only? I guess in case the caller sets
> (REQ_ZERO_WRITE | REQ_COMPRESSED) it also fires here.

True, but that's the caller's "fault".  One of those things has to take
precedence.

And I've just noticed that when bdrv_co_do_pwrite_zeroes() falls back to
a bounce buffer, it passes the original flags (without
BDRV_REQ_ZERO_WRITE) to bdrv_driver_pwritev(), so compression will take
effect then.  So the only result is that we first try a zero write, and
if that fails, we do a compressed write.  That sounds reasonable to me.

(I mean, we could do it the other way around, but I firstly I don't
think it matters much and secondly it's probably better this way because
I'd suspect zero writes to be more efficient than compressing the bounce
buffer; at least that's how it is for qcow2.)

Max

> Looks like noone does that though, but for example backup might;
> it supports compression, and it also does a zero detection itself and
> calls write_zeroes() when it can.
> It sets REQ_MAY_UNMAP which should indeed override a compression, but
> unmap might be not supported.
> 
> /Anton




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 21/39] qcow2: Update qcow2_alloc_cluster_link_l2() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> There's a loop in this function that iterates over the L2 entries in a
> table, so now we need to assert that it remains within the limits of
> an L2 slice.
> 
> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Hm, well, strictly speaking this patch should not be at this point in
this series -- e.g. handle_alloc() so far only limits its nb_clusters to
the L2 size, not the L2 slice size.

But that's nit picking because the slice size equals the L2 size anyway
(for now), so

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 27/39] qcow2: Update qcow2_update_snapshot_refcount() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> qcow2_update_snapshot_refcount() increases the refcount of all
> clusters of a given snapshot. In order to do that it needs to load all
> its L2 tables and iterate over their entries. Since we'll be loading
> L2 slices instead of full tables we need to add an extra loop that
> iterates over all slices of each L2 table.
> 
> This function doesn't need any additional changes so apart from that
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-refcount.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index dfa28301c4..60b521cb89 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1183,17 +1183,20 @@ int qcow2_update_snapshot_refcount(BlockDriverState 
> *bs,
>  int64_t l1_table_offset, int l1_size, int addend)
>  {
>  BDRVQcow2State *s = bs->opaque;
> -uint64_t *l1_table, *l2_table, l2_offset, entry, l1_size2, refcount;
> +uint64_t *l1_table, *l2_slice, l2_offset, entry, l1_size2, refcount;
>  bool l1_allocated = false;
>  int64_t old_entry, old_l2_offset;
> +unsigned slice, slice_size2, n_slices;

Hm, well. Hm.

>  int i, j, l1_modified = 0, nb_csectors;
>  int ret;
>  
>  assert(addend >= -1 && addend <= 1);
>  
> -l2_table = NULL;
> +l2_slice = NULL;
>  l1_table = NULL;
>  l1_size2 = l1_size * sizeof(uint64_t);
> +slice_size2 = s->l2_slice_size * sizeof(uint64_t);
> +n_slices = s->cluster_size / slice_size2;
>  
>  s->cache_discards = true;
>  

[...]

> @@ -1273,12 +1276,13 @@ int qcow2_update_snapshot_refcount(BlockDriverState 
> *bs,
>  case QCOW2_CLUSTER_NORMAL:
>  case QCOW2_CLUSTER_ZERO_ALLOC:
>  if (offset_into_cluster(s, offset)) {
> +int l2_index = slice * s->l2_slice_size + j;
>  qcow2_signal_corruption(
>  bs, true, -1, -1, "Cluster "
>  "allocation offset %#" PRIx64
>  " unaligned (L2 offset: %#"
>  PRIx64 ", L2 index: %#x)",
> -offset, l2_offset, j);
> +offset, l2_offset, l2_index);

This makes it a bit weird that in other patches l2_index is now
generally the L2 slice index...

Oh well.

Reviewed-by: Max Reitz 

>  ret = -EIO;
>  goto fail;
>  }



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 30/39] qcow2: Update expand_zero_clusters_in_l1() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> expand_zero_clusters_in_l1() expands zero clusters as a necessary step
> to downgrade qcow2 images to a version that doesn't support metadata
> zero clusters. This function takes an L1 table (which may or may not
> be active) and iterates over all its L2 tables looking for zero
> clusters.
> 
> Since we'll be loading L2 slices instead of full tables we need to add
> an extra loop that iterates over all slices of each L2 table, and we
> should also use the slice size when allocating the buffer used when
> the L1 table is not active.
> 
> This function doesn't need any additional changes so apart from that
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Finally, and since we have to touch the bdrv_read() / bdrv_write()
> calls anyway, this patch takes the opportunity to replace them with
> the byte-based bdrv_pread() / bdrv_pwrite().
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 52 
> ---
>  1 file changed, 29 insertions(+), 23 deletions(-)

Aside from the void * casts (and slice_size2 *cough* *cough*):

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 32/39] qcow2: Rename l2_table in qcow2_alloc_compressed_cluster_offset()

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices instead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 35/39] qcow2: Rename l2_table in count_cow_clusters()

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices intead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 38/39] iotests: Test downgrading an image using a small L2 slice size

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> expand_zero_clusters_in_l1() is used when downgrading qcow2 images
> from v3 to v2 (compat=0.10). This is one of the functions that needed
> more changes to support L2 slices, so this patch extends iotest 061 to
> test downgrading a qcow2 image using a smaller slice size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/061 | 16 
>  tests/qemu-iotests/061.out | 61 
> ++
>  2 files changed, 77 insertions(+)

Well, it would be better if we could just specify this option for all of
the qcow2 tests, but I can see that that's not really feasible...

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 13/39] qcow2: Add l2_slice_size field to BDRVQcow2State

2018-02-01 Thread Max Reitz
On 2018-02-01 10:51, Alberto Garcia wrote:
> On Wed 31 Jan 2018 08:48:08 PM CET, Max Reitz wrote:
>> On 2018-01-26 15:59, Alberto Garcia wrote:
>>> The BDRVQcow2State structure contains an l2_size field, which stores
>>> the number of 64-bit entries in an L2 table.
>>>
>>> For efficiency reasons we want to be able to load slices instead of
>>> full L2 tables, so we need to know how many entries an L2 slice can
>>> hold.
>>>
>>> An L2 slice is the portion of an L2 table that is loaded by the qcow2
>>> cache. At the moment that cache can only load complete tables,
>>> therefore an L2 slice has the same size as an L2 table (one cluster)
>>> and l2_size == l2_slice_size.
>>>
>>> Later we'll allow smaller slices, but until then we have to use this
>>> new l2_slice_size field to make the rest of the code ready for that.
>>>
>>> Signed-off-by: Alberto Garcia 
>>> Reviewed-by: Eric Blake 
>>> ---
>>>  block/qcow2.c | 3 +++
>>>  block/qcow2.h | 1 +
>>>  2 files changed, 4 insertions(+)
>>
>> Am I missing something or does this patch miss setting l2_slice_size
>> in qcow2_do_open()?
> 
> qcow2_do_open() calls qcow2_update_options() which is what reads
> l2-cache-entry-size and sets s->l2_slice_size.

So I was missing something, good.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-02-01 16:43, Alberto Garcia wrote:
> On Thu 01 Feb 2018 04:23:09 PM CET, Anton Nefedov wrote:
 However, I'm wondering whether this is the best approach.  The old
 L2 table is probably not going to be used after this function, so
 we're basically polluting the cache here.  That was bad enough so
 far, but now that actually means wasting multiple cache entries on
 it.

 Sure, the code is simpler this way.  But maybe it would still be
 better to manually copy the data over from the old offset...  (As
 long as it's not much more complicated.)
>>>
>>> You mean bypassing the cache altogether?
>>>
>>>  qcow2_cache_flush(bs, s->l2_table_cache);
>>>  new_table = g_malloc(s->cluster_size);
>>>  if (old_l2_offset & L1E_OFFSET_MASK) {
>>>  bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>>>  } else {
>>>  memset(new_table, 0, s->cluster_size);
>>>  }
>>>  bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>>>  g_free(new_table);
>>>
>>> ??
>>
>> (I know it's a draft so you probably just skipped that but just in
>> case) It seems ok to bypass the cache read - perhaps even a flush is
>> not necessary: old_l2_offset must be read-only and flushed at this
>> point; I believe new_l2_offset might be cached too, so it needs to be
>> updated.
> 
> One problem I see with this is that while we wouldn't pollute the cache
> we'd always be reading the table twice from disk in all cases:
> 
>  1) Read old table
>  2) Write new table
>  3) Read new table (after l2_allocate(), using the cache this time)
> 
> We can of course improve it by reading the old table from disk but
> directly in the cache -so we'd spare step (3)-, but we'd still have to
> read at least once from disk.
> 
> With the old code (especially if slice_size == cluster_size) we don't
> need to read anything if the L2 table is already cached:
> 
>  1) Get empty table from the cache
>  2) memcpy() the old data
>  3) Get new table from the cache (after l2_allocate()).

Well, then scratch the bdrv_pwrite() for the new table and keep using
the cache for that (because that actually sounds useful).

On second thought, though, it's rather probable the old L2 table is
already in the cache...  Before the guest does a write to a location, it
is reasonable to assume it has read from there before.

So I guess we could think about adding a parameter to qcow2_cache_put()
or something to reset the LRU counter because we probably won't need
that entry anymore.  But not something for this series, of course.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 28/39] qcow2: Read refcount before L2 table in expand_zero_clusters_in_l1()

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> At the moment it doesn't really make a difference whether we call
> qcow2_get_refcount() before of after reading the L2 table, but if we
> want to support L2 slices we'll need to read the refcount first.
> 
> This patch simply changes the order of those two operations to prepare
> for that. The patch with the actual semantic changes will be easier to
> read because of this.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 37/39] iotests: Test valid values of l2-cache-entry-size

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> The l2-cache-entry-size setting can only contain values that are
> powers of two between 512 and the cluster size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/103 | 17 +
>  tests/qemu-iotests/103.out |  3 +++
>  2 files changed, 20 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] virtio-blk: check for NULL BlockDriverState

2018-02-01 Thread Stefan Hajnoczi
On Tue, Jan 30, 2018 at 04:56:12PM +0100, Kevin Wolf wrote:
> Am 30.01.2018 um 13:38 hat Stefan Hajnoczi geschrieben:
> > On Mon, Jan 29, 2018 at 04:41:07PM +0100, Kevin Wolf wrote:
> > > Am 24.01.2018 um 12:31 hat Stefan Hajnoczi geschrieben:
> > > > On Mon, Jan 22, 2018 at 09:01:49AM -0600, Mark Kanda wrote:
> > > > > Add a BlockDriverState NULL check to virtio_blk_handle_request()
> > > > > to prevent a segfault if the drive is forcibly removed using HMP
> > > > > 'drive_del' (without performing a hotplug 'device_del' first).
> > > > > 
> > > > > Signed-off-by: Mark Kanda 
> > > > > Reviewed-by: Karl Heubaum 
> > > > > Reviewed-by: Ameya More 
> > > > > ---
> > > > >  hw/block/virtio-blk.c | 7 +++
> > > > >  1 file changed, 7 insertions(+)
> > > > > 
> > > > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > > > > index b1532e4..76ddbbf 100644
> > > > > --- a/hw/block/virtio-blk.c
> > > > > +++ b/hw/block/virtio-blk.c
> > > > > @@ -507,6 +507,13 @@ static int 
> > > > > virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> > > > >  return -1;
> > > > >  }
> > > > >  
> > > > > +/* If the drive was forcibly removed (e.g. HMP 'drive_del'), the 
> > > > > block
> > > > > + * driver state may be NULL and there is nothing left to do. */
> > > > > +if (!blk_bs(req->dev->blk)) {
> > > > 
> > > > Adding Markus Armbruster to check my understanding of drive_del:
> > > > 
> > > > 1. If id is a node name (e.g. created via blockdev-add) then attempting
> > > >to remove the root node produces the "Node %s is in use" error.  In
> > > >that case this patch isn't needed.
> > > > 
> > > > 2. If id is a BlockBackend (e.g. created via -drive) then removing the
> > > >root node is allowed.  The BlockBackend stays in place but blk->root
> > > >becomes NULL, hence this patch is needed.
> > > > 
> > > > Markus: What are the valid use cases for #2?  If blk->bs becomes NULL I
> > > > would think a lot more code beyond virtio-blk can segfault.
> > > 
> > > blk->root = NULL is completely normal, it is what happens with removable
> > > media when the drive is empty.
> > > 
> > > The problem, which was first reported during the 2.10 RC phase and was
> > > worked around in IDE code then, is that Paolo's commit 99723548561 added
> > > unconditional bdrv_inc/dec_in_flight() calls. I am pretty sure that any
> > > segfaults that Mark is seeing have the same cause.
> > > 
> > > We do need an in-flight counter even for those requests so that
> > > blk_drain() works correctly, so just making the calls condition wouldn't
> > > be right. However, this needs to become a separate counter in
> > > BlockBackend, and the drain functions must be changed to make use of it.
> > > 
> > > I did post rough patches back then, but they weren't quite ready, and
> > > since then they have fallen through the cracks.
> > 
> > Will you send a new version of that patch series?
> 
> I would like to continue my work on the drain functions (which this
> would be a part of) sooner or later, but the work to enable libvirt to
> use blockdev-add is at a higher priority at the moment.
> 
> So if you can wait, I'll get to it eventually. If not, feel free to pick
> up the patches.

Okay, I'll let you know early next week whether I am able to work on it.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: Fix CID for VMDK afl image

2018-02-01 Thread Eric Blake
On 02/01/2018 11:59 AM, Max Reitz wrote:

>>> H, now this fails again on my 32 bit build. :-(
>>>
>>> The issue there is that you get a "Cannot allocate memory" when trying
>>> to open the file.  My current fix was 2291712c39111a732 which simply
>>> converted that to "Invalid argument", but now it's getting a bit more
>>> complicated...  Should I just continue to play the game and check the
>>> output for "Cannot allocate memory" and print exactly what the reference
>>> output is expecting...?
>>
>> Ahhh. OK, then, with a big comment.
>>
>> I'd say let's just _notrun on 32 bit.
> 
> Sounds OK, but how would we test that? uname? Or just _notrun when we
> see the ENOMEM message?

_notrun when you detect ENOMEM seems reasonable (the equivalent of exit
status 77 in skipping a test in an automake context)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 36/39] qcow2: Allow configuring the L2 slice size

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> Now that the code is ready to handle L2 slices we can finally add an
> option to allow configuring their size.
> 
> An L2 slice is the portion of an L2 table that is read by the qcow2
> cache. Until now the cache was always reading full L2 tables, and
> since the L2 table size is equal to the cluster size this was not very
> efficient with large clusters. Here's a more detailed explanation of
> why it makes sense to have smaller cache entries in order to load L2
> data:
> 
>https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
> 
> This patch introduces a new command-line option to the qcow2 driver
> named l2-cache-entry-size (cf. l2-cache-size). The cache entry size
> has the same restrictions as the cluster size: it must be a power of
> two and it has the same range of allowed values, with the additional
> requirement that it must not be larger than the cluster size.
> 
> The L2 cache entry size (L2 slice size) remains equal to the cluster
> size for now by default, so this feature must be explicitly enabled.
> Although my tests show that 4KB slices consistently improve
> performance and give the best results, let's wait and make more tests
> with different cluster sizes before deciding on an optimal default.
> 
> Now that the cache entry size is not necessarily equal to the cluster
> size we need to reflect that in the MIN_L2_CACHE_SIZE documentation.
> That minimum value is a requirement of the COW algorithm: we need to
> read two L2 slices (and not two L2 tables) in order to do COW, see
> l2_allocate() for the actual code.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cache.c  | 10 --
>  block/qcow2.c| 34 +++---
>  block/qcow2.h|  6 --
>  qapi/block-core.json |  6 ++
>  4 files changed, 45 insertions(+), 11 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] virtio-blk: check for NULL BlockDriverState

2018-02-01 Thread Stefan Hajnoczi
On Mon, Jan 29, 2018 at 10:13:02AM -0600, Mark Kanda wrote:
> 
> 
> On 1/29/2018 9:41 AM, Kevin Wolf wrote:
> > Am 24.01.2018 um 12:31 hat Stefan Hajnoczi geschrieben:
> > > On Mon, Jan 22, 2018 at 09:01:49AM -0600, Mark Kanda wrote:
> > > > Add a BlockDriverState NULL check to virtio_blk_handle_request()
> > > > to prevent a segfault if the drive is forcibly removed using HMP
> > > > 'drive_del' (without performing a hotplug 'device_del' first).
> > > > 
> > > > Signed-off-by: Mark Kanda 
> > > > Reviewed-by: Karl Heubaum 
> > > > Reviewed-by: Ameya More 
> > > > ---
> > > >   hw/block/virtio-blk.c | 7 +++
> > > >   1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > > > index b1532e4..76ddbbf 100644
> > > > --- a/hw/block/virtio-blk.c
> > > > +++ b/hw/block/virtio-blk.c
> > > > @@ -507,6 +507,13 @@ static int 
> > > > virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> > > >   return -1;
> > > >   }
> > > > +/* If the drive was forcibly removed (e.g. HMP 'drive_del'), the 
> > > > block
> > > > + * driver state may be NULL and there is nothing left to do. */
> > > > +if (!blk_bs(req->dev->blk)) {
> > > 
> > > Adding Markus Armbruster to check my understanding of drive_del:
> > > 
> > > 1. If id is a node name (e.g. created via blockdev-add) then attempting
> > > to remove the root node produces the "Node %s is in use" error.  In
> > > that case this patch isn't needed.
> > > 
> > > 2. If id is a BlockBackend (e.g. created via -drive) then removing the
> > > root node is allowed.  The BlockBackend stays in place but blk->root
> > > becomes NULL, hence this patch is needed.
> > > 
> > > Markus: What are the valid use cases for #2?  If blk->bs becomes NULL I
> > > would think a lot more code beyond virtio-blk can segfault.
> > 
> > blk->root = NULL is completely normal, it is what happens with removable
> > media when the drive is empty.
> > 
> > The problem, which was first reported during the 2.10 RC phase and was
> > worked around in IDE code then, is that Paolo's commit 99723548561 added
> > unconditional bdrv_inc/dec_in_flight() calls. I am pretty sure that any
> > segfaults that Mark is seeing have the same cause.
> 
> That's correct. The segfault I encountered was the bdrv_inc_in_flight() call
> in blk_aio_prwv().

I think we should aim to fix the root cause for QEMU 2.12.  blk_*() APIs
are supposed to return an error when blk->root is NULL but the
blk_aio_*() ones currently crash.

If that's not possible then this patch can be merged as a workaround for
2.12 close to the release date.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-02-01 14:13, Alberto Garcia wrote:
> On Wed 31 Jan 2018 09:07:48 PM CET, Max Reitz wrote:
>> On 2018-01-26 15:59, Alberto Garcia wrote:
>>> This patch updates l2_allocate() to support the qcow2 cache returning
>>> L2 slices instead of full L2 tables.
>>>
>>> The old code simply gets an L2 table from the cache and initializes it
>>> with zeroes or with the contents of an existing table. With a cache
>>> that returns slices instead of tables the idea remains the same, but
>>> the code must now iterate over all the slices that are contained in an
>>> L2 table.
>>>
>>> Since now we're operating with slices the function can no longer
>>> return the newly-allocated table, so it's up to the caller to retrieve
>>> the appropriate L2 slice after calling l2_allocate() (note that with
>>> this patch the caller is still loading full L2 tables, but we'll deal
>>> with that in a separate patch).
>>>
>>> Signed-off-by: Alberto Garcia 
>>> ---
>>>  block/qcow2-cluster.c | 56 
>>> +++
>>>  1 file changed, 34 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 57349928a9..2a53c1dc5f 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
>>> l1_index)
>>>   *
>>>   */
>>>  
>>> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t 
>>> **table)
>>> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>>>  {
>>>  BDRVQcow2State *s = bs->opaque;
>>>  uint64_t old_l2_offset;
>>> -uint64_t *l2_table = NULL;
>>> +uint64_t *l2_slice = NULL;
>>> +unsigned slice, slice_size2, n_slices;
>>
>> I'd personally prefer size_t, but oh well.
> 
> I don't see any reason not to use int / unsigned. The size of a slice is
> always <= cluster_size (an int), and both slice and n_slices are smaller
> than that.

Well, what's the reason to use unsigned? :-)

The type of the expression used to set slice_size2 simply is size_t.

>> However, I'm wondering whether this is the best approach.  The old L2
>> table is probably not going to be used after this function, so we're
>> basically polluting the cache here.  That was bad enough so far, but
>> now that actually means wasting multiple cache entries on it.
>>
>> Sure, the code is simpler this way.  But maybe it would still be
>> better to manually copy the data over from the old offset...  (As long
>> as it's not much more complicated.)
> 
> You mean bypassing the cache altogether?

Yes.

> qcow2_cache_flush(bs, s->l2_table_cache);
> new_table = g_malloc(s->cluster_size);
> if (old_l2_offset & L1E_OFFSET_MASK) {
> bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
> } else {
> memset(new_table, 0, s->cluster_size);
> }
> bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
> g_free(new_table);
> 
> ??

Well, we wouldn't need to flush the whole table but only the parts of
the L2 table that we are about to copy.  And in theory we can omit even
that, because that old L2 table is not COPIED, so it can't be dirty anyway.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 23/39] qcow2: Update handle_alloc() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> handle_alloc() loads an L2 table and limits the number of checked
> clusters to the amount that fits inside that table. Since we'll be
> loading L2 slices instead of full tables we need to update that limit.
> 
> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 25/39] qcow2: Update zero_single_l2() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> zero_single_l2() limits the number of clusters to be zeroed to the
> amount that fits inside an L2 table. Since we'll be loading L2 slices
> instead of full tables we need to update that limit.

Same as last patch, maybe we should rename the function now.

> Apart from that, this function doesn't need any additional changes, so
> this patch simply updates the variable name from l2_table to l2_slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

Also again:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 26/39] qcow2: Prepare qcow2_update_snapshot_refcount() for adding L2 slice support

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Adding support for L2 slices to qcow2_update_snapshot_refcount() needs
> (among other things) an extra loop that iterates over all slices of
> each L2 table.
> 
> Putting all changes in one patch would make it hard to read because
> all semantic changes would be mixed with pure indentation changes.
> 
> To make things easier this patch simply creates a new block and
> changes the indentation of all lines of code inside it. Thus, all
> modifications in this patch are cosmetic. There are no semantic
> changes and no variables are renamed yet. The next patch will take
> care of that.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-refcount.c | 140 
> ++---
>  1 file changed, 73 insertions(+), 67 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 33/39] qcow2: Rename l2_table in count_contiguous_clusters()

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices intead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 34/39] qcow2: Rename l2_table in count_contiguous_clusters_unallocated()

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> This function doesn't need any changes to support L2 slices, but since
> it's now dealing with slices intead of full tables, the l2_table
> variable is renamed for clarity.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 39/39] iotests: Add l2-cache-entry-size to iotest 137

2018-02-01 Thread Max Reitz
On 2018-01-26 16:00, Alberto Garcia wrote:
> This test tries reopening a qcow2 image with valid and invalid
> options. This patch adds l2-cache-entry-size to the set.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/137 | 5 +
>  tests/qemu-iotests/137.out | 2 ++
>  2 files changed, 7 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-02-01 Thread Alberto Garcia
On Thu 01 Feb 2018 04:23:09 PM CET, Anton Nefedov wrote:
>>> However, I'm wondering whether this is the best approach.  The old
>>> L2 table is probably not going to be used after this function, so
>>> we're basically polluting the cache here.  That was bad enough so
>>> far, but now that actually means wasting multiple cache entries on
>>> it.
>>>
>>> Sure, the code is simpler this way.  But maybe it would still be
>>> better to manually copy the data over from the old offset...  (As
>>> long as it's not much more complicated.)
>> 
>> You mean bypassing the cache altogether?
>> 
>>  qcow2_cache_flush(bs, s->l2_table_cache);
>>  new_table = g_malloc(s->cluster_size);
>>  if (old_l2_offset & L1E_OFFSET_MASK) {
>>  bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>>  } else {
>>  memset(new_table, 0, s->cluster_size);
>>  }
>>  bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>>  g_free(new_table);
>> 
>> ??
>
> (I know it's a draft so you probably just skipped that but just in
> case) It seems ok to bypass the cache read - perhaps even a flush is
> not necessary: old_l2_offset must be read-only and flushed at this
> point; I believe new_l2_offset might be cached too, so it needs to be
> updated.

One problem I see with this is that while we wouldn't pollute the cache
we'd always be reading the table twice from disk in all cases:

 1) Read old table
 2) Write new table
 3) Read new table (after l2_allocate(), using the cache this time)

We can of course improve it by reading the old table from disk but
directly in the cache -so we'd spare step (3)-, but we'd still have to
read at least once from disk.

With the old code (especially if slice_size == cluster_size) we don't
need to read anything if the L2 table is already cached:

 1) Get empty table from the cache
 2) memcpy() the old data
 3) Get new table from the cache (after l2_allocate()).

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-02-01 Thread Anton Nefedov



On 1/2/2018 4:13 PM, Alberto Garcia wrote:

On Wed 31 Jan 2018 09:07:48 PM CET, Max Reitz wrote:

On 2018-01-26 15:59, Alberto Garcia wrote:

This patch updates l2_allocate() to support the qcow2 cache returning
L2 slices instead of full L2 tables.

The old code simply gets an L2 table from the cache and initializes it
with zeroes or with the contents of an existing table. With a cache
that returns slices instead of tables the idea remains the same, but
the code must now iterate over all the slices that are contained in an
L2 table.

Since now we're operating with slices the function can no longer
return the newly-allocated table, so it's up to the caller to retrieve
the appropriate L2 slice after calling l2_allocate() (note that with
this patch the caller is still loading full L2 tables, but we'll deal
with that in a separate patch).

Signed-off-by: Alberto Garcia 
---
  block/qcow2-cluster.c | 56 +++
  1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 57349928a9..2a53c1dc5f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
l1_index)
   *
   */
  
-static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)

+static int l2_allocate(BlockDriverState *bs, int l1_index)
  {
  BDRVQcow2State *s = bs->opaque;
  uint64_t old_l2_offset;
-uint64_t *l2_table = NULL;
+uint64_t *l2_slice = NULL;
+unsigned slice, slice_size2, n_slices;


I'd personally prefer size_t, but oh well.


I don't see any reason not to use int / unsigned. The size of a slice is
always <= cluster_size (an int), and both slice and n_slices are smaller
than that.


However, I'm wondering whether this is the best approach.  The old L2
table is probably not going to be used after this function, so we're
basically polluting the cache here.  That was bad enough so far, but
now that actually means wasting multiple cache entries on it.

Sure, the code is simpler this way.  But maybe it would still be
better to manually copy the data over from the old offset...  (As long
as it's not much more complicated.)


You mean bypassing the cache altogether?

 qcow2_cache_flush(bs, s->l2_table_cache);
 new_table = g_malloc(s->cluster_size);
 if (old_l2_offset & L1E_OFFSET_MASK) {
 bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
 } else {
 memset(new_table, 0, s->cluster_size);
 }
 bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
 g_free(new_table);

??

Berto



(I know it's a draft so you probably just skipped that but just in case)
It seems ok to bypass the cache read - perhaps even a flush is
not necessary: old_l2_offset must be read-only and flushed at this
point; I believe new_l2_offset might be cached too, so it
needs to be updated.




Re: [Qemu-block] [Qemu-devel] [PATCH] block: fix write with zero flag set and iovector provided

2018-02-01 Thread Alberto Garcia
On Thu 01 Feb 2018 03:40:51 PM CET, Eric Blake wrote:
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>>   */
>>>  tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
>>>  
>>> -if (!qiov) {
>>> +if (flags & BDRV_REQ_ZERO_WRITE) {
>>>  ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, );
>> 
>> So now, the flag rules, but we assert that !qiov (so it would only break
>> a caller that passed the flag but used qiov, which you argued shouldn't
>> exist).
>
> Sorry, I hit send too soon.  I'm asking if we should have
> assert(!qiov) right before calling bdrv_co_do_zero_pwritev (it would
> break a caller that passed the flag and qiov, but you were arguing
> that such callers previously misbehaved, so we don't want such
> callers).

Those callers do exist as a matter of fact: bdrv_rw_co_entry() always
passes a qiov to bdrv_co_pwritev() regardless of the flags (the request
size is actually taken from the very qiov).

bdrv_pwrite_zeroes() is one example:

$ qemu-img create -f qcow2 base.img 100M
$ qemu-img create -f qcow2 -b base.img active.img
$ qemu-io -c 'write -z 0 128k' -f qcow2 active.img 
$ qemu-img amend -o compat=0.10 active.img 

It even uses an iovec with iov_base = NULL but iov_len != 0, which looks
like an abuse of the data structure.

Berto



[Qemu-block] [PATCH 5/5] curl: convert to CoQueue

2018-02-01 Thread Paolo Bonzini
Now that CoQueues can use a QemuMutex for thread-safety, there is no
need for curl to roll its own coroutine queue.  Coroutines can be
placed directly on the queue instead of using a list of CURLAIOCBs.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 block/curl.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 35cf417f59..cd578d3d14 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -101,8 +101,6 @@ typedef struct CURLAIOCB {
 
 size_t start;
 size_t end;
-
-QSIMPLEQ_ENTRY(CURLAIOCB) next;
 } CURLAIOCB;
 
 typedef struct CURLSocket {
@@ -138,7 +136,7 @@ typedef struct BDRVCURLState {
 bool accept_range;
 AioContext *aio_context;
 QemuMutex mutex;
-QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq;
+CoQueue free_state_waitq;
 char *username;
 char *password;
 char *proxyusername;
@@ -538,7 +536,6 @@ static int curl_init_state(BDRVCURLState *s, CURLState 
*state)
 /* Called with s->mutex held.  */
 static void curl_clean_state(CURLState *s)
 {
-CURLAIOCB *next;
 int j;
 for (j = 0; j < CURL_NUM_ACB; j++) {
 assert(!s->acb[j]);
@@ -556,13 +553,7 @@ static void curl_clean_state(CURLState *s)
 
 s->in_use = 0;
 
-next = QSIMPLEQ_FIRST(>s->free_state_waitq);
-if (next) {
-QSIMPLEQ_REMOVE_HEAD(>s->free_state_waitq, next);
-qemu_mutex_unlock(>s->mutex);
-aio_co_wake(next->co);
-qemu_mutex_lock(>s->mutex);
-}
+qemu_co_enter_next(>s->free_state_waitq, >s->mutex);
 }
 
 static void curl_parse_filename(const char *filename, QDict *options,
@@ -784,7 +775,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 DPRINTF("CURL: Opening %s\n", file);
-QSIMPLEQ_INIT(>free_state_waitq);
+qemu_co_queue_init(>free_state_waitq);
 s->aio_context = bdrv_get_aio_context(bs);
 s->url = g_strdup(file);
 qemu_mutex_lock(>mutex);
@@ -888,10 +879,7 @@ static void curl_setup_preadv(BlockDriverState *bs, 
CURLAIOCB *acb)
 if (state) {
 break;
 }
-QSIMPLEQ_INSERT_TAIL(>free_state_waitq, acb, next);
-qemu_mutex_unlock(>mutex);
-qemu_coroutine_yield();
-qemu_mutex_lock(>mutex);
+qemu_co_queue_wait(>free_state_waitq, >mutex);
 }
 
 if (curl_init_state(s, state) < 0) {
-- 
2.14.3




[Qemu-block] [PATCH 2/5] lockable: add QemuLockable

2018-02-01 Thread Paolo Bonzini
QemuLockable is a polymorphic lock type that takes an object and
knows which function to use for locking and unlocking.  The
implementation could use C11 _Generic, but since the support is
not very widespread I am instead using __builtin_choose_expr and
__builtin_types_compatible_p, which are already used by
include/qemu/atomic.h.

QemuLockable can be used to implement lock guards, or to pass around
a lock in such a way that a function can release it and re-acquire it.
The next patch will do this for CoQueue.

Signed-off-by: Paolo Bonzini 
---
 include/qemu/compiler.h  | 40 
 include/qemu/coroutine.h |  4 +-
 include/qemu/lockable.h  | 96 
 include/qemu/thread.h|  5 +--
 include/qemu/typedefs.h  |  4 ++
 tests/test-coroutine.c   | 25 +
 6 files changed, 169 insertions(+), 5 deletions(-)
 create mode 100644 include/qemu/lockable.h

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 340e5fdc09..5179bedb1e 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -111,4 +111,44 @@
 #define GCC_FMT_ATTR(n, m)
 #endif
 
+/* Implement C11 _Generic via GCC builtins.  Example:
+ *
+ *QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
+ *
+ * The first argument is the discriminator.  The last is the default value.
+ * The middle ones are tuples in "(type, expansion)" format.
+ */
+
+/* First, find out the number of generic cases.  */
+#define QEMU_GENERIC(x, ...) \
+QEMU_GENERIC_(typeof(x), __VA_ARGS__, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+
+/* There will be extra arguments, but they are not used.  */
+#define QEMU_GENERIC_(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, count, ...) \
+QEMU_GENERIC##count(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9)
+
+/* Two more helper macros, this time to extract items from a parenthesized
+ * list.
+ */
+#define QEMU_FIRST_(a, b) a
+#define QEMU_SECOND_(a, b) b
+
+/* ... and a final one for the common part of the "recursion".  */
+#define QEMU_GENERIC_IF(x, type_then, else_)   
\
+__builtin_choose_expr(__builtin_types_compatible_p(x,  
\
+   QEMU_FIRST_ type_then), 
\
+  QEMU_SECOND_ type_then, else_)
+
+/* CPP poor man's "recursion".  */
+#define QEMU_GENERIC1(x, a0, ...) (a0)
+#define QEMU_GENERIC2(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC1(x, 
__VA_ARGS__))
+#define QEMU_GENERIC3(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC2(x, 
__VA_ARGS__))
+#define QEMU_GENERIC4(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC3(x, 
__VA_ARGS__))
+#define QEMU_GENERIC5(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC4(x, 
__VA_ARGS__))
+#define QEMU_GENERIC6(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC5(x, 
__VA_ARGS__))
+#define QEMU_GENERIC7(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC6(x, 
__VA_ARGS__))
+#define QEMU_GENERIC8(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC7(x, 
__VA_ARGS__))
+#define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, 
__VA_ARGS__))
+#define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x, 
__VA_ARGS__))
+
 #endif /* COMPILER_H */
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index ce2eb73670..8a5129741c 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -121,7 +121,7 @@ bool qemu_coroutine_entered(Coroutine *co);
  * Provides a mutex that can be used to synchronise coroutines
  */
 struct CoWaitRecord;
-typedef struct CoMutex {
+struct CoMutex {
 /* Count of pending lockers; 0 for a free mutex, 1 for an
  * uncontended mutex.
  */
@@ -142,7 +142,7 @@ typedef struct CoMutex {
 unsigned handoff, sequence;
 
 Coroutine *holder;
-} CoMutex;
+};
 
 /**
  * Initialises a CoMutex. This must be called before any other operation is 
used
diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
new file mode 100644
index 00..826ac3f675
--- /dev/null
+++ b/include/qemu/lockable.h
@@ -0,0 +1,96 @@
+/*
+ * Polymorphic locking functions (aka poor man templates)
+ *
+ * Copyright Red Hat, Inc. 2017, 2018
+ *
+ * Author: Paolo Bonzini 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_LOCKABLE_H
+#define QEMU_LOCKABLE_H
+
+#include "qemu/coroutine.h"
+#include "qemu/thread.h"
+
+typedef void QemuLockUnlockFunc(void *);
+
+struct QemuLockable {
+void *object;
+QemuLockUnlockFunc *lock;
+QemuLockUnlockFunc *unlock;
+};
+
+/* This function gives an error if an invalid, non-NULL pointer type is passed
+ * to QEMU_MAKE_LOCKABLE.  For optimized builds, we can rely on dead-code 
elimination
+ * from the compiler, and give the errors already at link time.
+ */
+#ifdef __OPTIMIZE__
+void unknown_lock_type(void *);
+#else
+static inline 

Re: [Qemu-block] [Qemu-devel] [PATCH v4 0/5] coroutine-lock: polymorphic CoQueue

2018-02-01 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180201212917.18131-1-pbonz...@redhat.com
Subject: [Qemu-devel] [PATCH v4 0/5] coroutine-lock: polymorphic CoQueue

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
87da24d9ea curl: convert to CoQueue
a94c2af350 coroutine-lock: make qemu_co_enter_next thread-safe
57eba09708 coroutine-lock: convert CoQueue to use QemuLockable
6efa617c67 lockable: add QemuLockable
1423cb0ab4 test-coroutine: add simple CoMutex test

=== OUTPUT BEGIN ===
Checking PATCH 1/5: test-coroutine: add simple CoMutex test...
ERROR: do not initialise statics to 0 or NULL
#30: FILE: tests/test-coroutine.c:198:
+static bool locked = false;

total: 1 errors, 0 warnings, 74 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/5: lockable: add QemuLockable...
WARNING: line over 80 characters
#58: FILE: include/qemu/compiler.h:144:
+#define QEMU_GENERIC2(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC1(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#59: FILE: include/qemu/compiler.h:145:
+#define QEMU_GENERIC3(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC2(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#60: FILE: include/qemu/compiler.h:146:
+#define QEMU_GENERIC4(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC3(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#61: FILE: include/qemu/compiler.h:147:
+#define QEMU_GENERIC5(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC4(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#62: FILE: include/qemu/compiler.h:148:
+#define QEMU_GENERIC6(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC5(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#63: FILE: include/qemu/compiler.h:149:
+#define QEMU_GENERIC7(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC6(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#64: FILE: include/qemu/compiler.h:150:
+#define QEMU_GENERIC8(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC7(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#65: FILE: include/qemu/compiler.h:151:
+#define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#66: FILE: include/qemu/compiler.h:152:
+#define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x, 
__VA_ARGS__))

WARNING: line over 80 characters
#124: FILE: include/qemu/lockable.h:28:
+ * to QEMU_MAKE_LOCKABLE.  For optimized builds, we can rely on dead-code 
elimination

WARNING: architecture specific defines should be avoided
#127: FILE: include/qemu/lockable.h:31:
+#ifdef __OPTIMIZE__

total: 0 errors, 11 warnings, 242 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 3/5: coroutine-lock: convert CoQueue to use QemuLockable...
Checking PATCH 4/5: coroutine-lock: make qemu_co_enter_next thread-safe...
Checking PATCH 5/5: curl: convert to CoQueue...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-block] [PATCH 1/5] test-coroutine: add simple CoMutex test

2018-02-01 Thread Paolo Bonzini
In preparation for adding a similar test using QemuLockable, add a very
simple testcase that has two interleaved calls to lock and unlock.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 tests/test-coroutine.c | 50 --
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index 76c646107e..010cb95ad6 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -175,7 +175,7 @@ static void coroutine_fn c1_fn(void *opaque)
 qemu_coroutine_enter(c2);
 }
 
-static void test_co_queue(void)
+static void test_no_dangling_access(void)
 {
 Coroutine *c1;
 Coroutine *c2;
@@ -195,6 +195,51 @@ static void test_co_queue(void)
 *c1 = tmp;
 }
 
+static bool locked = false;
+static int done;
+
+static void coroutine_fn mutex_fn(void *opaque)
+{
+CoMutex *m = opaque;
+qemu_co_mutex_lock(m);
+assert(!locked);
+locked = true;
+qemu_coroutine_yield();
+locked = false;
+qemu_co_mutex_unlock(m);
+done++;
+}
+
+static void do_test_co_mutex(CoroutineEntry *entry, void *opaque)
+{
+Coroutine *c1 = qemu_coroutine_create(entry, opaque);
+Coroutine *c2 = qemu_coroutine_create(entry, opaque);
+
+done = 0;
+qemu_coroutine_enter(c1);
+g_assert(locked);
+qemu_coroutine_enter(c2);
+
+/* Unlock queues c2.  It is then started automatically when c1 yields or
+ * terminates.
+ */
+qemu_coroutine_enter(c1);
+g_assert_cmpint(done, ==, 1);
+g_assert(locked);
+
+qemu_coroutine_enter(c2);
+g_assert_cmpint(done, ==, 2);
+g_assert(!locked);
+}
+
+static void test_co_mutex(void)
+{
+CoMutex m;
+
+qemu_co_mutex_init();
+do_test_co_mutex(mutex_fn, );
+}
+
 /*
  * Check that creation, enter, and return work
  */
@@ -422,7 +467,7 @@ int main(int argc, char **argv)
  * crash, so skip it.
  */
 if (CONFIG_COROUTINE_POOL) {
-g_test_add_func("/basic/co_queue", test_co_queue);
+g_test_add_func("/basic/no-dangling-access", test_no_dangling_access);
 }
 
 g_test_add_func("/basic/lifecycle", test_lifecycle);
@@ -432,6 +477,7 @@ int main(int argc, char **argv)
 g_test_add_func("/basic/entered", test_entered);
 g_test_add_func("/basic/in_coroutine", test_in_coroutine);
 g_test_add_func("/basic/order", test_order);
+g_test_add_func("/locking/co-mutex", test_co_mutex);
 if (g_test_perf()) {
 g_test_add_func("/perf/lifecycle", perf_lifecycle);
 g_test_add_func("/perf/nesting", perf_nesting);
-- 
2.14.3





[Qemu-block] [PATCH v4 0/5] coroutine-lock: polymorphic CoQueue

2018-02-01 Thread Paolo Bonzini
There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next).  In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use a
CoMutex and so cannot qemu_co_queue_wait.  This happens in curl (which
right now is rolling its own list of Coroutines) and will happen in
Fam's NVMe driver as well.

This series extracts the idea of a polymorphic lockable object
from my "scoped lock guard" proposal, and applies it to CoQueue.
The implementation of QemuLockable is similar to C11 _Generic, but
redone using the preprocessor and GCC builtins for compatibility.

In general, while a bit on the esoteric side, the functionality used
to emulate _Generic is fairly old in GCC, and the builtins are already
used by include/qemu/atomic.h; the series was tested with Fedora 27 (boot
Damn Small Linux via http) and CentOS 6 (compiled only).

Paolo

v3->v4: fix -O0 compilation [Fam]
typos and copyright dates [Eric, Fam]
improve CoQueue comments [Stefan]

Paolo Bonzini (5):
  test-coroutine: add simple CoMutex test
  lockable: add QemuLockable
  coroutine-lock: convert CoQueue to use QemuLockable
  coroutine-lock: make qemu_co_enter_next thread-safe
  curl: convert to CoQueue

 block/curl.c| 20 ++
 fsdev/qemu-fsdev-throttle.c |  4 +-
 include/qemu/compiler.h | 40 +++
 include/qemu/coroutine.h| 29 +-
 include/qemu/lockable.h | 96 +
 include/qemu/thread.h   |  5 +--
 include/qemu/typedefs.h |  4 ++
 tests/test-coroutine.c  | 75 ++-
 util/qemu-coroutine-lock.c  | 22 +++
 9 files changed, 256 insertions(+), 39 deletions(-)
 create mode 100644 include/qemu/lockable.h

-- 
2.14.3




[Qemu-block] [PATCH 3/5] coroutine-lock: convert CoQueue to use QemuLockable

2018-02-01 Thread Paolo Bonzini
There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next).  In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use
a CoMutex and so cannot qemu_co_queue_wait.  Use QemuLockable so
that the CoQueue can interchangeably use CoMutex or QemuMutex.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 include/qemu/coroutine.h   |  6 +-
 util/qemu-coroutine-lock.c | 12 +++-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 8a5129741c..1e5f0957e6 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -183,7 +183,9 @@ void qemu_co_queue_init(CoQueue *queue);
  * caller of the coroutine.  The mutex is unlocked during the wait and
  * locked again afterwards.
  */
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex);
+#define qemu_co_queue_wait(queue, lock) \
+qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock))
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock);
 
 /**
  * Restarts the next coroutine in the CoQueue and removes it from the queue.
@@ -271,4 +273,6 @@ void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, 
int64_t ns);
  */
 void coroutine_fn yield_until_fd_readable(int fd);
 
+#include "qemu/lockable.h"
+
 #endif /* QEMU_COROUTINE_H */
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 846ff9167f..2a66fc1467 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -40,13 +40,13 @@ void qemu_co_queue_init(CoQueue *queue)
 QSIMPLEQ_INIT(>entries);
 }
 
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock)
 {
 Coroutine *self = qemu_coroutine_self();
 QSIMPLEQ_INSERT_TAIL(>entries, self, co_queue_next);
 
-if (mutex) {
-qemu_co_mutex_unlock(mutex);
+if (lock) {
+qemu_lockable_unlock(lock);
 }
 
 /* There is no race condition here.  Other threads will call
@@ -60,9 +60,11 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex 
*mutex)
 /* TODO: OSv implements wait morphing here, where the wakeup
  * primitive automatically places the woken coroutine on the
  * mutex's queue.  This avoids the thundering herd effect.
+ * This could be implemented for CoMutexes, but not really for
+ * other cases of QemuLockable.
  */
-if (mutex) {
-qemu_co_mutex_lock(mutex);
+if (lock) {
+qemu_lockable_lock(lock);
 }
 }
 
-- 
2.14.3





[Qemu-block] [PATCH 4/5] coroutine-lock: make qemu_co_enter_next thread-safe

2018-02-01 Thread Paolo Bonzini
qemu_co_queue_next does not need to release and re-acquire the mutex,
because the queued coroutine does not run immediately.  However, this
does not hold for qemu_co_enter_next.  Now that qemu_co_queue_wait
can synchronize (via QemuLockable) with code that is not running in
coroutine context, it's important that code using qemu_co_enter_next
can easily use a standardized locking idiom.

First of all, qemu_co_enter_next must use aio_co_wake to restart the
coroutine.  Second, the function gains a second argument, a QemuLockable*,
and the comments of qemu_co_queue_next and qemu_co_queue_restart_all
are adjusted to clarify the difference.

Signed-off-by: Paolo Bonzini 
---
 fsdev/qemu-fsdev-throttle.c |  4 ++--
 include/qemu/coroutine.h| 19 +--
 util/qemu-coroutine-lock.c  | 10 --
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
index 49eebb5412..1dc07fbc12 100644
--- a/fsdev/qemu-fsdev-throttle.c
+++ b/fsdev/qemu-fsdev-throttle.c
@@ -20,13 +20,13 @@
 static void fsdev_throttle_read_timer_cb(void *opaque)
 {
 FsThrottle *fst = opaque;
-qemu_co_enter_next(>throttled_reqs[false]);
+qemu_co_enter_next(>throttled_reqs[false], NULL);
 }
 
 static void fsdev_throttle_write_timer_cb(void *opaque)
 {
 FsThrottle *fst = opaque;
-qemu_co_enter_next(>throttled_reqs[true]);
+qemu_co_enter_next(>throttled_reqs[true], NULL);
 }
 
 void fsdev_throttle_parse_opts(QemuOpts *opts, FsThrottle *fst, Error **errp)
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 1e5f0957e6..3afee7169b 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -188,21 +188,28 @@ void qemu_co_queue_init(CoQueue *queue);
 void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock);
 
 /**
- * Restarts the next coroutine in the CoQueue and removes it from the queue.
- *
- * Returns true if a coroutine was restarted, false if the queue is empty.
+ * Removes the next coroutine from the CoQueue, and wake it up.
+ * Returns true if a coroutine was removed, false if the queue is empty.
  */
 bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
 
 /**
- * Restarts all coroutines in the CoQueue and leaves the queue empty.
+ * Empties the CoQueue; all coroutines are woken up.
  */
 void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
 
 /**
- * Enter the next coroutine in the queue
+ * Removes the next coroutine from the CoQueue, and wake it up.  Unlike
+ * qemu_co_queue_next, this function releases the lock during aio_co_wake
+ * because it is meant to be used outside coroutine context; in that case, the
+ * coroutine is entered immediately, before qemu_co_enter_next returns.
+ *
+ * If used in coroutine context, qemu_co_enter_next is equivalent to
+ * qemu_co_queue_next.
  */
-bool qemu_co_enter_next(CoQueue *queue);
+#define qemu_co_enter_next(queue, lock) \
+qemu_co_enter_next_impl(queue, QEMU_MAKE_LOCKABLE(lock))
+bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock);
 
 /**
  * Checks if the CoQueue is empty.
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 2a66fc1467..78fb79acf8 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -132,7 +132,7 @@ void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue)
 qemu_co_queue_do_restart(queue, false);
 }
 
-bool qemu_co_enter_next(CoQueue *queue)
+bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock)
 {
 Coroutine *next;
 
@@ -142,7 +142,13 @@ bool qemu_co_enter_next(CoQueue *queue)
 }
 
 QSIMPLEQ_REMOVE_HEAD(>entries, co_queue_next);
-qemu_coroutine_enter(next);
+if (lock) {
+qemu_lockable_unlock(lock);
+}
+aio_co_wake(next);
+if (lock) {
+qemu_lockable_lock(lock);
+}
 return true;
 }
 
-- 
2.14.3





Re: [Qemu-block] [PATCH v3 31/39] qcow2: Update qcow2_truncate() to support L2 slices

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> The qcow2_truncate() code is mostly independent from whether
> we're using L2 slices or full L2 tables, but in full and
> falloc preallocation modes new L2 tables are allocated using
> qcow2_alloc_cluster_link_l2().  Therefore the code needs to be
> modified to ensure that all nb_clusters that are processed in each
> call can be allocated with just one L2 slice.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 78f067cae7..529becfa30 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3261,8 +3261,9 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
> offset,
>  guest_offset = old_length;
>  while (nb_new_data_clusters) {
>  int64_t guest_cluster = guest_offset >> s->cluster_bits;
> -int64_t nb_clusters = MIN(nb_new_data_clusters,
> -  s->l2_size - guest_cluster % 
> s->l2_size);
> +int64_t nb_clusters = MIN(
> +nb_new_data_clusters,
> +s->l2_slice_size - guest_cluster % s->l2_slice_size);

An alternative would be the
"s->l2_slice_size - offset_to_l2_slice_index(s, guest_offset)" we
basically have elsewhere, but that's longer and doesn't really matter:

Reviewed-by: Max Reitz 

>  QCowL2Meta allocation = {
>  .offset   = guest_offset,
>  .alloc_offset = host_offset,
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 29/39] qcow2: Prepare expand_zero_clusters_in_l1() for adding L2 slice support

2018-02-01 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Adding support for L2 slices to expand_zero_clusters_in_l1() needs
> (among other things) an extra loop that iterates over all slices of
> each L2 table.
> 
> Putting all changes in one patch would make it hard to read because
> all semantic changes would be mixed with pure indentation changes.
> 
> To make things easier this patch simply creates a new block and
> changes the indentation of all lines of code inside it. Thus, all
> modifications in this patch are cosmetic. There are no semantic
> changes and no variables are renamed yet. The next patch will take
> care of that.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 187 
> ++
>  1 file changed, 96 insertions(+), 91 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 13/39] qcow2: Add l2_slice_size field to BDRVQcow2State

2018-02-01 Thread Alberto Garcia
On Wed 31 Jan 2018 08:48:08 PM CET, Max Reitz wrote:
> On 2018-01-26 15:59, Alberto Garcia wrote:
>> The BDRVQcow2State structure contains an l2_size field, which stores
>> the number of 64-bit entries in an L2 table.
>> 
>> For efficiency reasons we want to be able to load slices instead of
>> full L2 tables, so we need to know how many entries an L2 slice can
>> hold.
>> 
>> An L2 slice is the portion of an L2 table that is loaded by the qcow2
>> cache. At the moment that cache can only load complete tables,
>> therefore an L2 slice has the same size as an L2 table (one cluster)
>> and l2_size == l2_slice_size.
>> 
>> Later we'll allow smaller slices, but until then we have to use this
>> new l2_slice_size field to make the rest of the code ready for that.
>> 
>> Signed-off-by: Alberto Garcia 
>> Reviewed-by: Eric Blake 
>> ---
>>  block/qcow2.c | 3 +++
>>  block/qcow2.h | 1 +
>>  2 files changed, 4 insertions(+)
>
> Am I missing something or does this patch miss setting l2_slice_size
> in qcow2_do_open()?

qcow2_do_open() calls qcow2_update_options() which is what reads
l2-cache-entry-size and sets s->l2_slice_size.

Berto