Re: [Qemu-block] [PATCH v5 00/10] virtio-blk: add DISCARD and WRITE_ZEROES features

2019-02-20 Thread Stefano Garzarella
Il giorno mer 20 feb 2019 alle 17:31 Stefan Hajnoczi 
ha scritto:

> On Mon, Feb 18, 2019 at 03:02:51PM +0100, Stefano Garzarella wrote:
> > This series adds the support of DISCARD and WRITE_ZEROES commands
> > and extends the virtio-blk-test to test these new commands.
> >
> > v5:
> > - rebased on master
> > - handled the config size for DISCARD and WRITE_ZEROES features as in
> >   virtio-net (patches 4 and 5) [Michael, Stefan]
> > - fixed an endianness issues on the WRITE_ZEROES test (patches 8 and 9)
> > - added a test for DISCARD command to only test the status returned by
> the
> >   request (patch 10)
> > - others patches are unchanged (patches 1, 2, 3, 6, 7)
>
> Looks good.  Please fix the patchew failure and resend.


Thanks for the review!
I’ll send the new version with the fix.

Cheers,
Stefano
-- 
Stefano Garzarella
Software Engineer @ Red Hat


Re: [Qemu-block] [PATCH v2 08/18] tests/bios-tables: Improve portability by searching bash in the $PATH

2019-02-20 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 06:53:53PM +0100, Philippe Mathieu-Daudé wrote:
> Bash is not always installed as /bin/bash. In particular on OpenBSD,
> the package installs it in /usr/local/bin.
> Use the 'env' shebang to search bash in the $PATH.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Michael S. Tsirkin 

> ---
>  tests/data/acpi/rebuild-expected-aml.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/data/acpi/rebuild-expected-aml.sh 
> b/tests/data/acpi/rebuild-expected-aml.sh
> index bf9ba242ad..ff77a751c8 100755
> --- a/tests/data/acpi/rebuild-expected-aml.sh
> +++ b/tests/data/acpi/rebuild-expected-aml.sh
> @@ -1,4 +1,4 @@
> -#! /bin/bash
> +#! /usr/bin/env bash
>  
>  #
>  # Rebuild expected AML files for acpi unit-test
> -- 
> 2.20.1



Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-20 Thread Fernando Casas Schössow
Hi Paolo,

This is Fernando, the one  that reported the issue.
Regarding the dumps I have three of them including guest memory, 2 for 
virtio-scsi, 1 for virtio-blk, in case a comparison may help to confirm which 
is the proble.) I can upload them to a server you indicate me or I can also put 
them on a server so you can download them as you see fit. Each dump, 
compressed, is around 500MB.

If it's more convenient for you I can try to get the requested information from 
gdb. But I will need some guidance since I'm not skilled enough with the 
debugger.

Another option, if you provide me with the right patch, is for me to patch, 
rebuild QEMU and repro the problem again. With virtio-scsi I'm able to repro 
this in a matter of hours most of the times, with virtio-blk it will take a 
couple of days.

Just let me know how do you prefer to move forward.

Thanks a lot for helping with this!

Kind regards,

Fernando

On mié, feb 20, 2019 at 6:53 PM, Paolo Bonzini  wrote:
On 20/02/19 17:58, Stefan Hajnoczi wrote:
On Mon, Feb 18, 2019 at 07:21:25AM +, Fernando Casas Schössow wrote:
It took a few days but last night the problem was reproduced. This is the 
information from the log: vdev 0x55f261d940f0 ("virtio-blk") vq 0x55f261d9ee40 
(idx 0) inuse 128 vring.num 128 old_shadow_avail_idx 58874 last_avail_idx 58625 
avail_idx 58874 avail 0x3d87a800 avail_idx (cache bypassed) 58625
Hi Paolo, Are you aware of any recent MemoryRegionCache issues? The avail_idx 
value 58874 was read via the cache while a non-cached read produces 58625! I 
suspect that 58625 is correct since the vring is already full and the driver 
wouldn't bump avail_idx any further until requests complete. Fernando also hits 
this issue with virtio-scsi so it's not a virtio_blk.ko driver bug or a 
virtio-blk device emulation issue.
No, I am not aware of any issues. How can I get the core dump (and the 
corresponding executable to get the symbols)? Alternatively, it should be 
enough to print the vq->vring.caches->avail.mrs from the debugger. Also, one 
possibility is to add in vring_avail_idx an assertion like 
assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev, vq->vring.avail + 
offsetof(VRingAvail, idx))); and try to catch the error earlier. Paolo
A QEMU core dump is available for debugging. Here is the patch that produced 
this debug output: diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 
a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c 
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; 
} rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if 
(virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= 
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + 
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + 
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + 
fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", 
old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + 
fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", 
vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, 
idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- 
core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; } Stefan
used_idx 58497 2019-02-18 03:20:08.605+: shutting down, reason=crashed The 
dump file, including guest memory, was generated successfully (after gzip the 
file is around 492MB). I switched the guest now to virtio-scsi to get the 
information and dump with this setup as well. How should we proceed? Thanks.



Re: [Qemu-block] [PATCH v2 05/13] io: Remove redundant read/write_coroutine assignments

2019-02-20 Thread Paolo Bonzini
On 20/02/19 19:07, Kevin Wolf wrote:
> Am 20.02.2019 um 19:01 hat Paolo Bonzini geschrieben:
>> aio_co_wake was also acquiring/releasing the AioContext, so
>> that needs to stay for now.
> 
> True. Maybe I should leave the aio_co_wake call alone() and just add the
> assertion to avoid complicating the code rather than simplifying it...

Yes, I agree.

Paolo

ps: where is alone() defined? :)

Paolo




Re: [Qemu-block] [PATCH v2 05/13] io: Remove redundant read/write_coroutine assignments

2019-02-20 Thread Kevin Wolf
Am 20.02.2019 um 19:01 hat Paolo Bonzini geschrieben:
> On 20/02/19 18:48, Kevin Wolf wrote:
> > -static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
> > -
> >  static void qio_channel_restart_read(void *opaque)
> >  {
> >  QIOChannel *ioc = opaque;
> >  Coroutine *co = ioc->read_coroutine;
> >  
> > -ioc->read_coroutine = NULL;
> > -qio_channel_set_aio_fd_handlers(ioc);
> > -aio_co_wake(co);
> > +assert(qemu_get_current_aio_context() ==
> > +   qemu_coroutine_get_aio_context(co));
> > +qemu_coroutine_enter(co);
> >  }
> >  
> >  static void qio_channel_restart_write(void *opaque)
> > @@ -417,9 +415,9 @@ static void qio_channel_restart_write(void *opaque)
> >  QIOChannel *ioc = opaque;
> >  Coroutine *co = ioc->write_coroutine;
> >  
> > -ioc->write_coroutine = NULL;
> > -qio_channel_set_aio_fd_handlers(ioc);
> > -aio_co_wake(co);
> > +assert(qemu_get_current_aio_context() ==
> > +   qemu_coroutine_get_aio_context(co));
> > +qemu_coroutine_enter(co);
> >  }
> >  
> >  static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
> > 
> 
> aio_co_wake was also acquiring/releasing the AioContext, so
> that needs to stay for now.

True. Maybe I should leave the aio_co_wake call alone() and just add the
assertion to avoid complicating the code rather than simplifying it...

Kevin



[Qemu-block] [PATCH v2 11/13] test-bdrv-drain: AioContext switch in drained section

2019-02-20 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 tests/test-bdrv-drain.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index ee1740ff06..1b1f6c17a5 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1522,6 +1522,36 @@ static void test_append_to_drained(void)
 blk_unref(blk);
 }
 
+static void test_set_aio_context(void)
+{
+BlockDriverState *bs;
+IOThread *a = iothread_new();
+IOThread *b = iothread_new();
+AioContext *ctx_a = iothread_get_aio_context(a);
+AioContext *ctx_b = iothread_get_aio_context(b);
+
+bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+  &error_abort);
+
+bdrv_drained_begin(bs);
+bdrv_set_aio_context(bs, ctx_a);
+
+aio_context_acquire(ctx_a);
+bdrv_drained_end(bs);
+
+bdrv_drained_begin(bs);
+bdrv_set_aio_context(bs, ctx_b);
+aio_context_release(ctx_a);
+aio_context_acquire(ctx_b);
+bdrv_set_aio_context(bs, qemu_get_aio_context());
+aio_context_release(ctx_b);
+bdrv_drained_end(bs);
+
+bdrv_unref(bs);
+iothread_join(a);
+iothread_join(b);
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -1603,6 +1633,8 @@ int main(int argc, char **argv)
 
 g_test_add_func("/bdrv-drain/attach/drain", test_append_to_drained);
 
+g_test_add_func("/bdrv-drain/set_aio_context", test_set_aio_context);
+
 ret = g_test_run();
 qemu_event_destroy(&done_event);
 return ret;
-- 
2.20.1




[Qemu-block] [PATCH v2 07/13] nbd: Use low-level QIOChannel API in nbd_read_eof()

2019-02-20 Thread Kevin Wolf
Instead of using the convenience wrapper qio_channel_read_all_eof(), use
the lower level QIOChannel API. This means duplicating some code, but
we'll need this because this coroutine yield is special: We want it to
be interruptible so that nbd_client_attach_aio_context() can correctly
reenter the coroutine.

This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
that connection_co will always sit in this exact qio_channel_yield()
call when bdrv_drain() returns.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 include/block/nbd.h |  4 ++--
 block/nbd-client.c  |  8 +---
 nbd/client.c| 46 -
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index cad975e00c..c6ef1ef42e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -300,8 +300,8 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
-   Error **errp);
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
+   NBDReply *reply, Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 5ce4aa9520..60f38f0320 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -84,15 +84,9 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
  *
  * Therefore we keep an additional in_flight reference all the time and
  * only drop it temporarily here.
- *
- * FIXME This is not safe because the QIOChannel could wake up the
- * coroutine for a second time; it is not prepared for coroutine
- * resumption from external code.
  */
-bdrv_dec_in_flight(s->bs);
 assert(s->reply.handle == 0);
-ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
-bdrv_inc_in_flight(s->bs);
+ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
 
 if (local_err) {
 trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
diff --git a/nbd/client.c b/nbd/client.c
index 28d174c0f3..de7da48246 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1394,30 +1394,58 @@ static int 
nbd_receive_structured_reply_chunk(QIOChannel *ioc,
  * negative errno on failure (errp is set)
  */
 static inline int coroutine_fn
-nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size, Error **errp)
+nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
+ Error **errp)
 {
-int ret;
+bool partial = false;
 
 assert(size);
-ret = qio_channel_read_all_eof(ioc, buffer, size, errp);
-if (ret < 0) {
-ret = -EIO;
+while (size > 0) {
+struct iovec iov = { .iov_base = buffer, .iov_len = size };
+ssize_t len;
+
+len = qio_channel_readv(ioc, &iov, 1, errp);
+if (len == QIO_CHANNEL_ERR_BLOCK) {
+bdrv_dec_in_flight(bs);
+qio_channel_yield(ioc, G_IO_IN);
+bdrv_inc_in_flight(bs);
+continue;
+} else if (len < 0) {
+return -EIO;
+} else if (len == 0) {
+if (partial) {
+error_setg(errp,
+   "Unexpected end-of-file before all bytes were 
read");
+return -EIO;
+} else {
+return 0;
+}
+}
+
+partial = true;
+size -= len;
+buffer = (uint8_t*) buffer + len;
 }
-return ret;
+return 1;
 }
 
 /* nbd_receive_reply
+ *
+ * Decreases bs->in_flight while waiting for a new reply. This yield is where
+ * we wait indefinitely and the coroutine must be able to be safely reentered
+ * for nbd_client_attach_aio_context().
+ *
  * Returns 1 on success
  * 0 on eof, when no data was read (errp is not set)
  * negative errno on failure (errp is set)
  */
-int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
-   Error **errp)
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
+   NBDReply *reply, Error **errp)
 {
 int ret;
 const char *type;
 
-ret = nbd_read_eof(ioc, &reply->magic, sizeof(reply->magic), errp);
+ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
 if (ret <= 0) {
 return ret;
 }
-- 
2.20.1




[Qemu-block] [PATCH v2 12/13] block: Use normal drain for bdrv_set_aio_context()

2019-02-20 Thread Kevin Wolf
Now that bdrv_set_aio_context() works inside drained sections, it can
also use the real drain function instead of open coding something
similar.

Signed-off-by: Kevin Wolf 
---
 block.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index aefb5701f5..375a216f76 100644
--- a/block.c
+++ b/block.c
@@ -5268,18 +5268,16 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
 bs->walking_aio_notifiers = false;
 }
 
+/* The caller must own the AioContext lock for the old AioContext of bs, but it
+ * must not own the AioContext lock for new_context (unless new_context is
+ * the same as the current context of bs). */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
 {
-AioContext *ctx = bdrv_get_aio_context(bs);
-
-if (ctx == new_context) {
+if (bdrv_get_aio_context(bs) == new_context) {
 return;
 }
 
-aio_disable_external(ctx);
-bdrv_parent_drained_begin(bs, NULL, false);
-bdrv_drain(bs); /* ensure there are no in-flight requests */
-
+bdrv_drained_begin(bs);
 bdrv_detach_aio_context(bs);
 
 /* This function executes in the old AioContext so acquire the new one in
@@ -5287,8 +5285,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, 
AioContext *new_context)
  */
 aio_context_acquire(new_context);
 bdrv_attach_aio_context(bs, new_context);
-bdrv_parent_drained_end(bs, NULL, false);
-aio_enable_external(ctx);
+bdrv_drained_end(bs);
 aio_context_release(new_context);
 }
 
-- 
2.20.1




Re: [Qemu-block] [PATCH v2 05/13] io: Remove redundant read/write_coroutine assignments

2019-02-20 Thread Paolo Bonzini
On 20/02/19 18:48, Kevin Wolf wrote:
> -static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
> -
>  static void qio_channel_restart_read(void *opaque)
>  {
>  QIOChannel *ioc = opaque;
>  Coroutine *co = ioc->read_coroutine;
>  
> -ioc->read_coroutine = NULL;
> -qio_channel_set_aio_fd_handlers(ioc);
> -aio_co_wake(co);
> +assert(qemu_get_current_aio_context() ==
> +   qemu_coroutine_get_aio_context(co));
> +qemu_coroutine_enter(co);
>  }
>  
>  static void qio_channel_restart_write(void *opaque)
> @@ -417,9 +415,9 @@ static void qio_channel_restart_write(void *opaque)
>  QIOChannel *ioc = opaque;
>  Coroutine *co = ioc->write_coroutine;
>  
> -ioc->write_coroutine = NULL;
> -qio_channel_set_aio_fd_handlers(ioc);
> -aio_co_wake(co);
> +assert(qemu_get_current_aio_context() ==
> +   qemu_coroutine_get_aio_context(co));
> +qemu_coroutine_enter(co);
>  }
>  
>  static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
> 

aio_co_wake was also acquiring/releasing the AioContext, so
that needs to stay for now.

Paolo



[Qemu-block] [PATCH v2 08/13] nbd: Increase bs->in_flight during AioContext switch

2019-02-20 Thread Kevin Wolf
bdrv_drain() must not leave connection_co scheduled, so bs->in_flight
needs to be increased while the coroutine is waiting to be scheduled
in the new AioContext after nbd_client_attach_aio_context().

Signed-off-by: Kevin Wolf 
---
 block/nbd-client.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 60f38f0320..bfbaf7ebe9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -977,14 +977,30 @@ void nbd_client_detach_aio_context(BlockDriverState *bs)
 qio_channel_detach_aio_context(QIO_CHANNEL(client->ioc));
 }
 
+static void nbd_client_attach_aio_context_bh(void *opaque)
+{
+BlockDriverState *bs = opaque;
+NBDClientSession *client = nbd_get_client_session(bs);
+
+/* The node is still drained, so we know the coroutine has yielded in
+ * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is
+ * entered for the first time. Both places are safe for entering the
+ * coroutine.*/
+qemu_aio_coroutine_enter(bs->aio_context, client->connection_co);
+bdrv_dec_in_flight(bs);
+}
+
 void nbd_client_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
 qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
 
-/* FIXME Really need a bdrv_inc_in_flight() here */
-aio_co_schedule(new_context, client->connection_co);
+bdrv_inc_in_flight(bs);
+
+/* Need to wait here for the BH to run because the BH must run while the
+ * node is still drained. */
+aio_wait_bh_oneshot(new_context, nbd_client_attach_aio_context_bh, bs);
 }
 
 void nbd_client_close(BlockDriverState *bs)
-- 
2.20.1




[Qemu-block] [PATCH v2 01/13] block-backend: Make blk_inc/dec_in_flight public

2019-02-20 Thread Kevin Wolf
For some users of BlockBackends, just increasing the in_flight counter
is easier than implementing separate handlers in BlockDevOps. Make the
helper functions for this public.

Signed-off-by: Kevin Wolf 
---
 include/sysemu/block-backend.h | 2 ++
 block/block-backend.c  | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 832a4bf168..e2066eb06b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -156,6 +156,8 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int 
bytes);
 int blk_co_flush(BlockBackend *blk);
 int blk_flush(BlockBackend *blk);
 int blk_commit_all(void);
+void blk_inc_in_flight(BlockBackend *blk);
+void blk_dec_in_flight(BlockBackend *blk);
 void blk_drain(BlockBackend *blk);
 void blk_drain_all(void);
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
diff --git a/block/block-backend.c b/block/block-backend.c
index f6ea824308..0219555f89 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1262,12 +1262,12 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags 
flags)
 return bdrv_make_zero(blk->root, flags);
 }
 
-static void blk_inc_in_flight(BlockBackend *blk)
+void blk_inc_in_flight(BlockBackend *blk)
 {
 atomic_inc(&blk->in_flight);
 }
 
-static void blk_dec_in_flight(BlockBackend *blk)
+void blk_dec_in_flight(BlockBackend *blk)
 {
 atomic_dec(&blk->in_flight);
 aio_wait_kick();
-- 
2.20.1




[Qemu-block] [PATCH v2 02/13] virtio-blk: Increase in_flight for request restart BH

2019-02-20 Thread Kevin Wolf
virtio_blk_dma_restart_bh() submits new requests, so in order to make
sure that these requests are not started inside a drained section of the
attached BlockBackend, we need to make sure that draining the
BlockBackend waits for the BH to be executed.

This BH is still questionable because its scheduled in the main thread
instead of the configured iothread. Leave a FIXME comment for this.

But with this fix, enabling the data plane at least waits for these
requests (in bdrv_set_aio_context()) instead of changing the AioContext
under their feet and making them run in the wrong thread, causing
crashes and failures (e.g. due to missing locking).

Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cf7f47eaba..e11e6e45d0 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -675,6 +675,7 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 if (mrb.num_reqs) {
 virtio_blk_submit_multireq(s->blk, &mrb);
 }
+blk_dec_in_flight(s->conf.conf.blk);
 aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
@@ -688,8 +689,11 @@ static void virtio_blk_dma_restart_cb(void *opaque, int 
running,
 }
 
 if (!s->bh) {
+/* FIXME The data plane is not started yet, so these requests are
+ * processed in the main thread. */
 s->bh = aio_bh_new(blk_get_aio_context(s->conf.conf.blk),
virtio_blk_dma_restart_bh, s);
+blk_inc_in_flight(s->conf.conf.blk);
 qemu_bh_schedule(s->bh);
 }
 }
-- 
2.20.1




Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-20 Thread Paolo Bonzini
On 20/02/19 17:58, Stefan Hajnoczi wrote:
> On Mon, Feb 18, 2019 at 07:21:25AM +, Fernando Casas Schössow wrote:
>> It took a few days but last night the problem was reproduced.
>> This is the information from the log:
>>
>> vdev 0x55f261d940f0 ("virtio-blk")
>> vq 0x55f261d9ee40 (idx 0)
>> inuse 128 vring.num 128
>> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
>> avail 0x3d87a800 avail_idx (cache bypassed) 58625
> 
> Hi Paolo,
> Are you aware of any recent MemoryRegionCache issues?  The avail_idx
> value 58874 was read via the cache while a non-cached read produces
> 58625!
> 
> I suspect that 58625 is correct since the vring is already full and the
> driver wouldn't bump avail_idx any further until requests complete.
> 
> Fernando also hits this issue with virtio-scsi so it's not a
> virtio_blk.ko driver bug or a virtio-blk device emulation issue.

No, I am not aware of any issues.

How can I get the core dump (and the corresponding executable to get the
symbols)?  Alternatively, it should be enough to print the
vq->vring.caches->avail.mrs from the debugger.

Also, one possibility is to add in vring_avail_idx an assertion like

   assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev,
vq->vring.avail + offsetof(VRingAvail, idx)));

and try to catch the error earlier.

Paolo

> A QEMU core dump is available for debugging.
> 
> Here is the patch that produced this debug output:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index a1ff647a66..28d89fcbcb 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>  return NULL;
>  }
>  rcu_read_lock();
> +uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
>  if (virtio_queue_empty_rcu(vq)) {
>  goto done;
>  }
> @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>  max = vq->vring.num;
> 
>  if (vq->inuse >= vq->vring.num) {
> +fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
> +fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - 
> vdev->vq));
> +fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
> +fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx 
> %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
> +fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) 
> %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + 
> offsetof(VRingAvail, idx)));
> +fprintf(stderr, "used_idx %u\n", vq->used_idx);
> +abort(); /* <--- core dump! */
>  virtio_error(vdev, "Virtqueue size exceeded");
>  goto done;
>  }
> 
> Stefan
> 
>> used_idx 58497
>> 2019-02-18 03:20:08.605+: shutting down, reason=crashed
>>
>> The dump file, including guest memory, was generated successfully (after 
>> gzip the file is around 492MB).
>> I switched the guest now to virtio-scsi to get the information and dump with 
>> this setup as well.
>>
>> How should we proceed?
>>
>> Thanks.
>>
>> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi  wrote:
>> Thanks for collecting the data! The fact that both virtio-blk and 
>> virtio-scsi failed suggests it's not a virtqueue element leak in the 
>> virtio-blk or virtio-scsi device emulation code. The hung task error 
>> messages from inside the guest are a consequence of QEMU hitting the 
>> "Virtqueue size exceeded" error. QEMU refuses to process further requests 
>> after the error, causing tasks inside the guest to get stuck on I/O. I don't 
>> have a good theory regarding the root cause. Two ideas: 1. The guest is 
>> corrupting the vring or submitting more requests than will fit into the 
>> ring. Somewhat unlikely because it happens with both Windows and Linux 
>> guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache 
>> which is used for fast guest RAM accesses. Here is an expanded version of 
>> the debug patch which might help identify which of these scenarios is 
>> likely. Sorry, it requires running the guest again! This time let's make 
>> QEMU dump core so both QEMU state and guest RAM are captured for further 
>> debugging. That way it will be possible to extract more information using 
>> gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c 
>> b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- 
>> a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void 
>> *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + 
>> uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if 
>> (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void 
>> *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse 
>> >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, 
>> vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - 
>> vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", 

[Qemu-block] [PATCH v2 13/13] aio-posix: Assert that aio_poll() is always called in home thread

2019-02-20 Thread Kevin Wolf
aio_poll() has an existing assertion that the function is only called
from the AioContext's home thread if blocking is allowed.

This is not enough, some handlers make assumptions about the thread they
run in. Extend the assertion to non-blocking calls, too.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 util/aio-posix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 8640dfde9f..6fbfa7924f 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -613,6 +613,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 int64_t timeout;
 int64_t start = 0;
 
+assert(in_aio_context_home_thread(ctx));
+
 /* aio_notify can avoid the expensive event_notifier_set if
  * everything (file descriptors, bottom halves, timers) will
  * be re-evaluated before the next blocking poll().  This is
@@ -621,7 +623,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
  * so disable the optimization now.
  */
 if (blocking) {
-assert(in_aio_context_home_thread(ctx));
 atomic_add(&ctx->notify_me, 2);
 }
 
-- 
2.20.1




[Qemu-block] [PATCH v2 09/13] block: Don't poll in bdrv_set_aio_context()

2019-02-20 Thread Kevin Wolf
The explicit aio_poll() call in bdrv_set_aio_context() was added in
commit c2b6428d388 as a workaround for bdrv_drain() failing to achieve
to actually quiesce everything (specifically the NBD client code to
switch AioContext).

Now that the NBD client has been fixed to complete this operation during
bdrv_drain(), we don't need the workaround any more.

It was wrong anyway: aio_poll() must always be run in the home thread of
the AioContext.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block.c b/block.c
index 0c12632661..17bc1d3dca 100644
--- a/block.c
+++ b/block.c
@@ -5273,10 +5273,6 @@ void bdrv_set_aio_context(BlockDriverState *bs, 
AioContext *new_context)
 bdrv_parent_drained_begin(bs, NULL, false);
 bdrv_drain(bs); /* ensure there are no in-flight requests */
 
-while (aio_poll(ctx, false)) {
-/* wait for all bottom halves to execute */
-}
-
 bdrv_detach_aio_context(bs);
 
 /* This function executes in the old AioContext so acquire the new one in
-- 
2.20.1




[Qemu-block] [PATCH v2 10/13] block: Fix AioContext switch for drained node

2019-02-20 Thread Kevin Wolf
When a drained node changes its AioContext, we need to move its
aio_disable_external() to the new context, too.

Without this fix, drain_end will try to reenable the new context, which
has never been disabled, so an assertion failure is triggered.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block.c b/block.c
index 17bc1d3dca..aefb5701f5 100644
--- a/block.c
+++ b/block.c
@@ -5227,6 +5227,9 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
 bdrv_detach_aio_context(child->bs);
 }
 
+if (bs->quiesce_counter) {
+aio_enable_external(bs->aio_context);
+}
 bs->aio_context = NULL;
 }
 
@@ -5240,6 +5243,10 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
 return;
 }
 
+if (bs->quiesce_counter) {
+aio_disable_external(new_context);
+}
+
 bs->aio_context = new_context;
 
 QLIST_FOREACH(child, &bs->children, next) {
-- 
2.20.1




[Qemu-block] [PATCH v2 05/13] io: Remove redundant read/write_coroutine assignments

2019-02-20 Thread Kevin Wolf
qio_channel_yield() now updates ioc->read_write/coroutine and calls
qio_channel_set_aio_fd_handlers(), so the code in the handlers has
become redundant and can be removed.

This does not make a difference in intermediate states because
aio_co_wake() really enters the coroutine immediately here: These
handlers are never run in coroutine context, and we're in the right
AioContext because qio_channel_attach_aio_context() asserts that the
handlers are inactive.

To make these conditions more obvious, replace the aio_co_wake() with a
direct qemu_coroutine_enter() and assert the right AioContext.

Suggested-by: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 io/channel.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/io/channel.c b/io/channel.c
index 303376e08d..aa3edf6019 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -400,16 +400,14 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
 }
 
 
-static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
-
 static void qio_channel_restart_read(void *opaque)
 {
 QIOChannel *ioc = opaque;
 Coroutine *co = ioc->read_coroutine;
 
-ioc->read_coroutine = NULL;
-qio_channel_set_aio_fd_handlers(ioc);
-aio_co_wake(co);
+assert(qemu_get_current_aio_context() ==
+   qemu_coroutine_get_aio_context(co));
+qemu_coroutine_enter(co);
 }
 
 static void qio_channel_restart_write(void *opaque)
@@ -417,9 +415,9 @@ static void qio_channel_restart_write(void *opaque)
 QIOChannel *ioc = opaque;
 Coroutine *co = ioc->write_coroutine;
 
-ioc->write_coroutine = NULL;
-qio_channel_set_aio_fd_handlers(ioc);
-aio_co_wake(co);
+assert(qemu_get_current_aio_context() ==
+   qemu_coroutine_get_aio_context(co));
+qemu_coroutine_enter(co);
 }
 
 static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
-- 
2.20.1




[Qemu-block] [PATCH v2 06/13] nbd: Move nbd_read_eof() to nbd/client.c

2019-02-20 Thread Kevin Wolf
The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't
have to live in the header file, but can move next to its caller.

Also add the missing coroutine_fn to the function and its caller.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 include/block/nbd.h |  3 ++-
 nbd/nbd-internal.h  | 19 ---
 nbd/client.c| 22 +-
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 96cfb1d7d5..cad975e00c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -300,7 +300,8 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
+int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
+   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 82aa221227..049f83df77 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -64,25 +64,6 @@
 #define NBD_SET_TIMEOUT _IO(0xab, 9)
 #define NBD_SET_FLAGS   _IO(0xab, 10)
 
-/* nbd_read_eof
- * Tries to read @size bytes from @ioc.
- * Returns 1 on success
- * 0 on eof, when no data was read (errp is not set)
- * negative errno on failure (errp is set)
- */
-static inline int nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
-   Error **errp)
-{
-int ret;
-
-assert(size);
-ret = qio_channel_read_all_eof(ioc, buffer, size, errp);
-if (ret < 0) {
-ret = -EIO;
-}
-return ret;
-}
-
 /* nbd_write
  * Writes @size bytes to @ioc. Returns 0 on success.
  */
diff --git a/nbd/client.c b/nbd/client.c
index 10a52ad7d0..28d174c0f3 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1387,12 +1387,32 @@ static int 
nbd_receive_structured_reply_chunk(QIOChannel *ioc,
 return 0;
 }
 
+/* nbd_read_eof
+ * Tries to read @size bytes from @ioc.
+ * Returns 1 on success
+ * 0 on eof, when no data was read (errp is not set)
+ * negative errno on failure (errp is set)
+ */
+static inline int coroutine_fn
+nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size, Error **errp)
+{
+int ret;
+
+assert(size);
+ret = qio_channel_read_all_eof(ioc, buffer, size, errp);
+if (ret < 0) {
+ret = -EIO;
+}
+return ret;
+}
+
 /* nbd_receive_reply
  * Returns 1 on success
  * 0 on eof, when no data was read (errp is not set)
  * negative errno on failure (errp is set)
  */
-int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
+int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
+   Error **errp)
 {
 int ret;
 const char *type;
-- 
2.20.1




[Qemu-block] [PATCH v2 04/13] io: Make qio_channel_yield() interruptible

2019-02-20 Thread Kevin Wolf
Similar to how qemu_co_sleep_ns() allows preemption from an external
coroutine entry, allow reentering qio_channel_yield() early.

Signed-off-by: Kevin Wolf 
---
 include/io/channel.h |  9 ++---
 io/channel.c | 10 ++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index da2f138200..59460cb1ec 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -739,10 +739,13 @@ void qio_channel_detach_aio_context(QIOChannel *ioc);
  * addition, no two coroutine can be waiting on the same condition
  * and channel at the same time.
  *
- * This must only be called from coroutine context
+ * This must only be called from coroutine context. It is safe to
+ * reenter the coroutine externally while it is waiting; in this
+ * case the function will return even if @condition is not yet
+ * available.
  */
-void qio_channel_yield(QIOChannel *ioc,
-   GIOCondition condition);
+void coroutine_fn qio_channel_yield(QIOChannel *ioc,
+GIOCondition condition);
 
 /**
  * qio_channel_wait:
diff --git a/io/channel.c b/io/channel.c
index 8dd0684f5d..303376e08d 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -469,6 +469,16 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
 }
 qio_channel_set_aio_fd_handlers(ioc);
 qemu_coroutine_yield();
+
+/* Allow interrupting the operation by reentering the coroutine other than
+ * through the aio_fd_handlers. */
+if (condition == G_IO_IN && ioc->read_coroutine) {
+ioc->read_coroutine = NULL;
+qio_channel_set_aio_fd_handlers(ioc);
+} else if (condition == G_IO_OUT && ioc->write_coroutine) {
+ioc->write_coroutine = NULL;
+qio_channel_set_aio_fd_handlers(ioc);
+}
 }
 
 
-- 
2.20.1




[Qemu-block] [PATCH v2 00/13] block: bdrv_set_aio_context() related fixes

2019-02-20 Thread Kevin Wolf
Background for this series is the following bug report, which is about a
crash with virtio-blk + iothread and request resubmission for werror/rerror:

https://bugzilla.redhat.com/show_bug.cgi?id=1671173

The reason is that bdrv_set_aio_context() didn't correctly quiesce
everything. Instead, it had a local hack to call aio_poll() for the
source AioContext, which covered some, but not all cases, and is wrong
because you can only call aio_poll() from the home thread.

So this series tries to make bdrv_drain() actually drain the known cases
(fixes virtio-blk and the NBD client) and use the regular drain
functions in bdrv_set_aio_context() instead of open-coding something
similar.

v2:
- New patch 5: Remove now redundant assignments [Paolo]
- Patch 8 (was 7): Instead of using aio_co_schedule() to switch the
  AioContext and modifying the coroutine code, enter the coroutine in a
  BH in the new AioContext and bdrv_dec_in_flight() in that BH [Paolo]
- Patch 12 (was 11): Cover new == old case in comment [Eric]

Kevin Wolf (13):
  block-backend: Make blk_inc/dec_in_flight public
  virtio-blk: Increase in_flight for request restart BH
  nbd: Restrict connection_co reentrance
  io: Make qio_channel_yield() interruptible
  io: Remove redundant read/write_coroutine assignments
  nbd: Move nbd_read_eof() to nbd/client.c
  nbd: Use low-level QIOChannel API in nbd_read_eof()
  nbd: Increase bs->in_flight during AioContext switch
  block: Don't poll in bdrv_set_aio_context()
  block: Fix AioContext switch for drained node
  test-bdrv-drain: AioContext switch in drained section
  block: Use normal drain for bdrv_set_aio_context()
  aio-posix: Assert that aio_poll() is always called in home thread

 block/nbd-client.h |  1 +
 include/block/nbd.h|  3 +-
 include/io/channel.h   |  9 --
 include/sysemu/block-backend.h |  2 ++
 nbd/nbd-internal.h | 19 -
 block.c| 26 -
 block/block-backend.c  |  4 +--
 block/nbd-client.c | 36 +--
 hw/block/virtio-blk.c  |  4 +++
 io/channel.c   | 24 ++--
 nbd/client.c   | 52 --
 tests/test-bdrv-drain.c| 32 +
 util/aio-posix.c   |  3 +-
 13 files changed, 164 insertions(+), 51 deletions(-)

-- 
2.20.1




[Qemu-block] [PATCH v2 03/13] nbd: Restrict connection_co reentrance

2019-02-20 Thread Kevin Wolf
nbd_client_attach_aio_context() schedules connection_co in the new
AioContext and this way reenters it in any arbitrary place that has
yielded. We can restrict this a bit to the function call where the
coroutine actually sits waiting when it's idle.

This doesn't solve any bug yet, but it shows where in the code we need
to support this random reentrance and where we don't have to care.

Add FIXME comments for the existing bugs that the rest of this series
will fix.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/nbd-client.h |  1 +
 block/nbd-client.c | 22 ++
 2 files changed, 23 insertions(+)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index d990207a5c..09e03013d2 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -35,6 +35,7 @@ typedef struct NBDClientSession {
 
 NBDClientRequest requests[MAX_NBD_REQUESTS];
 NBDReply reply;
+BlockDriverState *bs;
 bool quit;
 } NBDClientSession;
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index f0ad54ce21..5ce4aa9520 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -76,8 +76,24 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 Error *local_err = NULL;
 
 while (!s->quit) {
+/*
+ * The NBD client can only really be considered idle when it has
+ * yielded from qio_channel_readv_all_eof(), waiting for data. This is
+ * the point where the additional scheduled coroutine entry happens
+ * after nbd_client_attach_aio_context().
+ *
+ * Therefore we keep an additional in_flight reference all the time and
+ * only drop it temporarily here.
+ *
+ * FIXME This is not safe because the QIOChannel could wake up the
+ * coroutine for a second time; it is not prepared for coroutine
+ * resumption from external code.
+ */
+bdrv_dec_in_flight(s->bs);
 assert(s->reply.handle == 0);
 ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
+bdrv_inc_in_flight(s->bs);
+
 if (local_err) {
 trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
 error_free(local_err);
@@ -116,6 +132,8 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 
 s->quit = true;
 nbd_recv_coroutines_wake_all(s);
+bdrv_dec_in_flight(s->bs);
+
 s->connection_co = NULL;
 aio_wait_kick();
 }
@@ -970,6 +988,8 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 {
 NBDClientSession *client = nbd_get_client_session(bs);
 qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
+
+/* FIXME Really need a bdrv_inc_in_flight() here */
 aio_co_schedule(new_context, client->connection_co);
 }
 
@@ -1076,6 +1096,7 @@ static int nbd_client_connect(BlockDriverState *bs,
  * kick the reply mechanism.  */
 qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
 client->connection_co = qemu_coroutine_create(nbd_connection_entry, 
client);
+bdrv_inc_in_flight(bs);
 nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
 logout("Established connection with NBD server\n");
@@ -1108,6 +1129,7 @@ int nbd_client_init(BlockDriverState *bs,
 {
 NBDClientSession *client = nbd_get_client_session(bs);
 
+client->bs = bs;
 qemu_co_mutex_init(&client->send_mutex);
 qemu_co_queue_init(&client->free_sema);
 
-- 
2.20.1




Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 07:21:25AM +, Fernando Casas Schössow wrote:
> It took a few days but last night the problem was reproduced.
> This is the information from the log:
> 
> vdev 0x55f261d940f0 ("virtio-blk")
> vq 0x55f261d9ee40 (idx 0)
> inuse 128 vring.num 128
> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
> avail 0x3d87a800 avail_idx (cache bypassed) 58625

Hi Paolo,
Are you aware of any recent MemoryRegionCache issues?  The avail_idx
value 58874 was read via the cache while a non-cached read produces
58625!

I suspect that 58625 is correct since the vring is already full and the
driver wouldn't bump avail_idx any further until requests complete.

Fernando also hits this issue with virtio-scsi so it's not a
virtio_blk.ko driver bug or a virtio-blk device emulation issue.

A QEMU core dump is available for debugging.

Here is the patch that produced this debug output:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a1ff647a66..28d89fcbcb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 return NULL;
 }
 rcu_read_lock();
+uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
 if (virtio_queue_empty_rcu(vq)) {
 goto done;
 }
@@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 max = vq->vring.num;

 if (vq->inuse >= vq->vring.num) {
+fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
+fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
+fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
+fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx 
%u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
+fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) 
%u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + 
offsetof(VRingAvail, idx)));
+fprintf(stderr, "used_idx %u\n", vq->used_idx);
+abort(); /* <--- core dump! */
 virtio_error(vdev, "Virtqueue size exceeded");
 goto done;
 }

Stefan

> used_idx 58497
> 2019-02-18 03:20:08.605+: shutting down, reason=crashed
> 
> The dump file, including guest memory, was generated successfully (after gzip 
> the file is around 492MB).
> I switched the guest now to virtio-scsi to get the information and dump with 
> this setup as well.
> 
> How should we proceed?
> 
> Thanks.
> 
> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi  wrote:
> Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi 
> failed suggests it's not a virtqueue element leak in the virtio-blk or 
> virtio-scsi device emulation code. The hung task error messages from inside 
> the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" 
> error. QEMU refuses to process further requests after the error, causing 
> tasks inside the guest to get stuck on I/O. I don't have a good theory 
> regarding the root cause. Two ideas: 1. The guest is corrupting the vring or 
> submitting more requests than will fit into the ring. Somewhat unlikely 
> because it happens with both Windows and Linux guests. 2. QEMU's virtqueue 
> code is buggy, maybe the memory region cache which is used for fast guest RAM 
> accesses. Here is an expanded version of the debug patch which might help 
> identify which of these scenarios is likely. Sorry, it requires running the 
> guest again! This time let's make QEMU dump core so both QEMU state and guest 
> RAM are captured for further debugging. That way it will be possible to 
> extract more information using gdb without rerunning. Stefan --- diff --git 
> a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 
> --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void 
> *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + 
> uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if 
> (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void 
> *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse 
> >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, 
> vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - 
> vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, 
> vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u 
> avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, 
> vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx 
> (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, 
> vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx 
> %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, 
> "Virtqueue size exceeded"); goto done; }
> 
> 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch

2019-02-20 Thread Kevin Wolf
Am 18.02.2019 um 18:22 hat Paolo Bonzini geschrieben:
> On 18/02/19 17:18, Kevin Wolf wrote:
> > +/* aio_ctx_switch is only supposed to be set if we're sitting 
> > in
> > + * the qio_channel_yield() below. */
> > +assert(!*aio_ctx_switch);
> >  bdrv_dec_in_flight(bs);
> >  qio_channel_yield(ioc, G_IO_IN);
> > -bdrv_inc_in_flight(bs);
> > +if (*aio_ctx_switch) {
> > +/* nbd_client_attach_aio_context() already increased 
> > in_flight
> > + * when scheduling this coroutine for reentry */
> > +*aio_ctx_switch = false;
> > +} else {
> > +bdrv_inc_in_flight(bs);
> > +}
> 
> Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
> unconditionally here, and in nbd_connection_entry do the opposite, like
> 
>   if (s->aio_ctx_switch) {
>   s->aio_ctx_switch = false;
>   bdrv_dec_in_flight(bs);
>   }
> 
> but I guess the problem is that then bdrv_drain could hang.
> 
> So my question is:
> 
> 1) is there a testcase that shows the problem with this "obvious"
> refactoring;
> 
> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
> the s->aio_ctx_switch flag, you could go through a bottom half that does
> the bdrv_inc_in_flight and then enters client->connection_co?

Actually, this is going to become a bit ugly, too. I can't just schedule
the BH and return because then the node isn't drained any more when the
BH actually runs - and when it's not drained, we don't know where the
coroutine is, so we can't reenter it.

With an AIO_WAIT_WHILE() in the old thread, it should work, though...

Kevin



Re: [Qemu-block] [PATCH v5 00/10] virtio-blk: add DISCARD and WRITE_ZEROES features

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 03:02:51PM +0100, Stefano Garzarella wrote:
> This series adds the support of DISCARD and WRITE_ZEROES commands
> and extends the virtio-blk-test to test these new commands.
> 
> v5:
> - rebased on master
> - handled the config size for DISCARD and WRITE_ZEROES features as in
>   virtio-net (patches 4 and 5) [Michael, Stefan]
> - fixed an endianness issues on the WRITE_ZEROES test (patches 8 and 9)
> - added a test for DISCARD command to only test the status returned by the
>   request (patch 10)
> - others patches are unchanged (patches 1, 2, 3, 6, 7)

Looks good.  Please fix the patchew failure and resend.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/17] block: local qiov helper

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 05:09:09PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is a new simple helper for a very often patter
> around qemu_iovec_init_external, when we need simple qiov with only
> one iov, initialized from external buffer.
> 
> v4:
>  01: tiny improvements by Eric
>  + fix bug: s/niov/nalloc in assertion
>  + rename s/qemu_iovec_get_buf/qemu_iovec_buf,
>to don't look like getter, which in turn should not return
>pointer to something internal to be freed separately.
>  So, no r-b's in 01 
>  07,10,16
> - rebase on new function name qemu_iovec_buf 
> - save r-b's by Eric and Stefan
>  others unchanged, add r-b's by Eric and Stefan
> 
> v3:
>   01-02: tiny improvements, described in patch-emails
>   03-17: new patches
> 
>   Note: only hw/scsi/scsi-disk.c not updated, as it has too tricky
> logic around @iov fields of structures. So, it is simpler to
> keep it as is.
> 
> v2 was "[PATCH v2 0/2] block: local qiov helper: part I"
> https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01610.html
> 
> Vladimir Sementsov-Ogievskiy (17):
>   block: enhance QEMUIOVector structure
>   block/io: use qemu_iovec_init_buf
>   block/block-backend: use QEMU_IOVEC_INIT_BUF
>   block/backup: use qemu_iovec_init_buf
>   block/commit: use QEMU_IOVEC_INIT_BUF
>   block/stream: use QEMU_IOVEC_INIT_BUF
>   block/parallels: use QEMU_IOVEC_INIT_BUF
>   block/qcow: use qemu_iovec_init_buf
>   block/qcow2: use qemu_iovec_init_buf
>   block/qed: use qemu_iovec_init_buf
>   block/vmdk: use qemu_iovec_init_buf
>   qemu-img: use qemu_iovec_init_buf
>   migration/block: use qemu_iovec_init_buf
>   tests/test-bdrv-drain: use QEMU_IOVEC_INIT_BUF
>   hw/ide: drop iov field from IDEState
>   hw/ide: drop iov field from IDEBufferedRequest
>   hw/ide: drop iov field from IDEDMA
> 
>  include/hw/ide/internal.h |  3 --
>  include/qemu/iov.h| 64 +++-
>  block/backup.c|  5 +--
>  block/block-backend.c | 13 +-
>  block/commit.c|  7 +--
>  block/io.c| 89 +--
>  block/parallels.c | 13 +++---
>  block/qcow.c  | 21 ++---
>  block/qcow2.c | 12 +-
>  block/qed-table.c | 16 ++-
>  block/qed.c   | 31 --
>  block/stream.c|  7 +--
>  block/vmdk.c  |  7 +--
>  hw/ide/atapi.c| 14 +++---
>  hw/ide/core.c | 19 -
>  migration/block.c | 10 ++---
>  qemu-img.c| 10 +
>  tests/test-bdrv-drain.c   | 29 ++---
>  18 files changed, 134 insertions(+), 236 deletions(-)
> 
> -- 
> 2.18.0
> 
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [RFC PATCH 2/2] block/dirty-bitmap: implement inconsistent bit

2019-02-20 Thread Eric Blake
On 2/19/19 4:00 PM, John Snow wrote:

>>
>> hmm, I also think we should report our deprecated status as locked for 
>> inconsistent bitmaps..
>>
>>
> 
> Though we're trying to deprecate the field altogether, I *could* add a
> special status flag that makes it unambiguous. This will catch the
> attention of anyone using the old API.

Adding to the enum, even though it is going away in the future, is still
helpful in the present, so I can live with that approach.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v5 05/10] virtio-blk: set config size depending on the features enabled

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 03:02:56PM +0100, Stefano Garzarella wrote:
> Starting from DISABLE and WRITE_ZEROES features, we use an array of
> VirtIOFeature (as virtio-net) to properly set the config size
> depending on the features enabled.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  hw/block/virtio-blk.c  | 31 +--
>  include/hw/virtio/virtio-blk.h |  1 +
>  2 files changed, 26 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v5 10/10] tests/virtio-blk: add test for DISCARD command

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 03:03:01PM +0100, Stefano Garzarella wrote:
> If the DISCARD feature is enabled, we try this command in the
> test_basic(), checking only the status returned by the request.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  tests/virtio-blk-test.c | 27 +++
>  1 file changed, 27 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 2/2] iotests: avoid broken pipe with certtool

2019-02-20 Thread Eric Blake
On 2/20/19 8:58 AM, Daniel P. Berrangé wrote:
> When we run "certtool | head -1" the latter command is likely to
> complete and exit before certtool has written everything it wants to
> stderr. In at least the RHEL-7 gnutls 3.3.29 this causes certtool to

A bit of a mismatch: had we actually used only 'certtool | head -1', it
would be early death before writing all it wants to stdout (not stderr).
 But since the patch itself is replacing 'certtool 2>&1 | head -1',
where stderr is also in the picture due to the additional fd
redirection, I'm not sure if it is better to just s/"certtool
|/"certtool 2>&1 |/ to match the patch, or to s/stderr/stdout/ for
brevity, or to just leave things as written.  Your call.

> quit with broken pipe before it has finished writing the desired
> output file to disk. This causes non-deterministic failures of the
> iotest 233 because the certs are sometimes zero length files.
> If certtool fails the "head -1" means we also loose any useful error

s/loose/lose/

> message it would have printed.
> 
> Thus this patch gets rid of the pipe and post-processes the output in a
> more flexible & reliable manner.
> 
> Reported-by: Thomas Huth 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  tests/qemu-iotests/common.tls | 48 +++
>  1 file changed, 32 insertions(+), 16 deletions(-)
> 

Reviewed-by: Eric Blake 

As the fix is for an iotest using NBD, I can take this through my tree
if no one else picks it up through some other block or iotests tree first.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v5 08/10] tests/virtio-blk: add virtio_blk_fix_dwz_hdr() function

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 03:02:59PM +0100, Stefano Garzarella wrote:
> This function is useful to fix the endianness of struct
> virtio_blk_discard_write_zeroes headers.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  tests/virtio-blk-test.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v5 04/10] virtio-net: make VirtIOFeature usable for other virtio devices

2019-02-20 Thread Stefan Hajnoczi
On Mon, Feb 18, 2019 at 03:02:55PM +0100, Stefano Garzarella wrote:
> In order to use VirtIOFeature also in other virtio devices, we move
> its declaration and the endof() macro (renamed in virtio_endof())
> in virtio.h.
> We add virtio_feature_get_config_size() function to iterate the array
> of VirtIOFeature and to return the config size depending on the
> features enabled. (as virtio_net_set_config_size() did)
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 
> ---
>  hw/net/virtio-net.c| 31 +++
>  hw/virtio/virtio.c | 15 +++
>  include/hw/virtio/virtio.h | 15 +++
>  3 files changed, 37 insertions(+), 24 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [ovirt-users] Re: Unable to upload images

2019-02-20 Thread Michael Blanchard
That fixed it!  I actually downloaded it 7 times on two different machines and 
used 3 different browsers.

Get Outlook for Android


From: Nir Soffer 
Sent: Tuesday, February 19, 2019 6:21:14 PM
To: Michael Blanchard
Cc: users; Daniel Erez; qemu-block
Subject: Re: [ovirt-users] Re: Unable to upload images

On Wed, Feb 20, 2019 at 1:01 AM 
mailto:mich...@wanderingmad.com>> wrote:
qemu-img info:
LM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk
image: LoadMaster-VLM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk
file format: raw
virtual size: 16G (17179869696 bytes)
disk size: 16G

This is raw image, so it may work, but

ls -l:
-rwxrwxrwx 1 michael michael 17179869185 Jan  7 16:43 
LoadMaster-VLM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk

The image size is invalid. raw image size must be aligned to 512 bytes, this is 
why
we block such images in the UI.

Is it possible that the image was truncated?

According to qemu-img info, the size is 17179869696. qemu-img lie about the 
size by rounding
up to the next multiple of 512.

I think this will fix your image:

truncate -s 17179869696 
LoadMaster-VLM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk

After that uploading the image should work.

Downloading the image again and verifying the image checksum with the vendor is 
probably
a good idea if you are not sure about the contents of this image.

Nir

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.


[Qemu-block] [PATCH v2 0/2] Fix NBD TLS iotests on RHEL-7

2019-02-20 Thread Daniel P . Berrangé
This fixes a failure of iotest 233 due to certtool problesm in RHEL7 wrt
to SIGPIPE

Changed in v2:

 - Put certtool.log into the $tls_dir instead of cwd

Daniel P. Berrangé (2):
  iotests: ensure we print nbd server log on error
  iotests: avoid broken pipe with certtool

 tests/qemu-iotests/233|  3 +++
 tests/qemu-iotests/common.tls | 48 +++
 2 files changed, 35 insertions(+), 16 deletions(-)

-- 
2.20.1




[Qemu-block] [PATCH v2 2/2] iotests: avoid broken pipe with certtool

2019-02-20 Thread Daniel P . Berrangé
When we run "certtool | head -1" the latter command is likely to
complete and exit before certtool has written everything it wants to
stderr. In at least the RHEL-7 gnutls 3.3.29 this causes certtool to
quit with broken pipe before it has finished writing the desired
output file to disk. This causes non-deterministic failures of the
iotest 233 because the certs are sometimes zero length files.
If certtool fails the "head -1" means we also loose any useful error
message it would have printed.

Thus this patch gets rid of the pipe and post-processes the output in a
more flexible & reliable manner.

Reported-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/common.tls | 48 +++
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/tests/qemu-iotests/common.tls b/tests/qemu-iotests/common.tls
index eae81789bb..3caf989d28 100644
--- a/tests/qemu-iotests/common.tls
+++ b/tests/qemu-iotests/common.tls
@@ -29,6 +29,17 @@ tls_x509_cleanup()
 }
 
 
+tls_certtool()
+{
+certtool "$@" 1>"${tls_dir}"/certtool.log 2>&1
+if test "$?" = 0; then
+  head -1 "${tls_dir}"/certtool.log
+else
+  cat "${tls_dir}"/certtool.log
+fi
+rm -f "${tls_dir}"/certtool.log
+}
+
 tls_x509_init()
 {
 (certtool --help) >/dev/null 2>&1 || \
@@ -71,10 +82,11 @@ ca
 cert_signing_key
 EOF
 
-certtool --generate-self-signed \
- --load-privkey "${tls_dir}/key.pem" \
- --template "${tls_dir}/ca.info" \
- --outfile "${tls_dir}/$name-cert.pem" 2>&1 | head -1
+tls_certtool \
+--generate-self-signed \
+--load-privkey "${tls_dir}/key.pem" \
+--template "${tls_dir}/ca.info" \
+--outfile "${tls_dir}/$name-cert.pem"
 
 rm -f "${tls_dir}/ca.info"
 }
@@ -98,12 +110,14 @@ encryption_key
 signing_key
 EOF
 
-certtool --generate-certificate \
- --load-ca-privkey "${tls_dir}/key.pem" \
- --load-ca-certificate "${tls_dir}/$caname-cert.pem" \
- --load-privkey "${tls_dir}/key.pem" \
- --template "${tls_dir}/cert.info" \
- --outfile "${tls_dir}/$name/server-cert.pem" 2>&1 | head -1
+tls_certtool \
+--generate-certificate \
+--load-ca-privkey "${tls_dir}/key.pem" \
+--load-ca-certificate "${tls_dir}/$caname-cert.pem" \
+--load-privkey "${tls_dir}/key.pem" \
+--template "${tls_dir}/cert.info" \
+--outfile "${tls_dir}/$name/server-cert.pem"
+
 ln -s "${tls_dir}/$caname-cert.pem" "${tls_dir}/$name/ca-cert.pem"
 ln -s "${tls_dir}/key.pem" "${tls_dir}/$name/server-key.pem"
 
@@ -127,12 +141,14 @@ encryption_key
 signing_key
 EOF
 
-certtool --generate-certificate \
- --load-ca-privkey "${tls_dir}/key.pem" \
- --load-ca-certificate "${tls_dir}/$caname-cert.pem" \
- --load-privkey "${tls_dir}/key.pem" \
- --template "${tls_dir}/cert.info" \
- --outfile "${tls_dir}/$name/client-cert.pem" 2>&1 | head -1
+tls_certtool \
+--generate-certificate \
+--load-ca-privkey "${tls_dir}/key.pem" \
+--load-ca-certificate "${tls_dir}/$caname-cert.pem" \
+--load-privkey "${tls_dir}/key.pem" \
+--template "${tls_dir}/cert.info" \
+--outfile "${tls_dir}/$name/client-cert.pem"
+
 ln -s "${tls_dir}/$caname-cert.pem" "${tls_dir}/$name/ca-cert.pem"
 ln -s "${tls_dir}/key.pem" "${tls_dir}/$name/client-key.pem"
 
-- 
2.20.1




[Qemu-block] [PATCH v2 1/2] iotests: ensure we print nbd server log on error

2019-02-20 Thread Daniel P . Berrangé
If we abort the iotest early the server.log file might contain useful
information for diagnosing the problem. Ensure its contents are
displayed in this case.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/233 | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/233 b/tests/qemu-iotests/233
index fc345a1a46..27932df075 100755
--- a/tests/qemu-iotests/233
+++ b/tests/qemu-iotests/233
@@ -30,6 +30,8 @@ _cleanup()
 {
 nbd_server_stop
 _cleanup_test_img
+# If we aborted early we want to see this log for diagnosis
+test -f "$TEST_DIR/server.log" && cat "$TEST_DIR/server.log"
 rm -f "$TEST_DIR/server.log"
 tls_x509_cleanup
 }
@@ -120,6 +122,7 @@ $QEMU_IO -f $IMGFMT -r -U -c 'r -P 0x22 1m 1m' "$TEST_IMG" 
| _filter_qemu_io
 echo
 echo "== final server log =="
 cat "$TEST_DIR/server.log"
+rm -f $TEST_DIR/server.log
 
 # success, all done
 echo "*** done"
-- 
2.20.1




Re: [Qemu-block] [Qemu-devel] [PATCH] block/pflash_cfi02: Fix memory leak and potential use-after-free

2019-02-20 Thread Wei Yang
On Tue, Feb 19, 2019 at 10:37:27AM -0500, Stephen Checkoway wrote:
>Don't dynamically allocate the pflash's timer. But do use timer_del in
>an unrealize function to make sure that the timer can't fire after the
>pflash_t has been freed.
>
>Signed-off-by: Stephen Checkoway 

Reviewed-by: Wei Yang 

>---
> hw/block/pflash_cfi02.c | 15 +++
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
>diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
>index 0f8b7b8c7b..1588aeff5a 100644
>--- a/hw/block/pflash_cfi02.c
>+++ b/hw/block/pflash_cfi02.c
>@@ -84,7 +84,7 @@ struct pflash_t {
> uint16_t unlock_addr0;
> uint16_t unlock_addr1;
> uint8_t cfi_table[0x52];
>-QEMUTimer *timer;
>+QEMUTimer timer;
> /* The device replicates the flash memory across its memory space.  
> Emulate
>  * that by having a container (.mem) filled with an array of aliases
>  * (.mem_mappings) pointing to the flash memory (.orig_mem).
>@@ -429,7 +429,7 @@ static void pflash_write (pflash_t *pfl, hwaddr offset,
> }
> pfl->status = 0x00;
> /* Let's wait 5 seconds before chip erase is done */
>-timer_mod(pfl->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>+timer_mod(&pfl->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>   (NANOSECONDS_PER_SECOND * 5));
> break;
> case 0x30:
>@@ -444,7 +444,7 @@ static void pflash_write (pflash_t *pfl, hwaddr offset,
> }
> pfl->status = 0x00;
> /* Let's wait 1/2 second before sector erase is done */
>-timer_mod(pfl->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>+timer_mod(&pfl->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>   (NANOSECONDS_PER_SECOND / 2));
> break;
> default:
>@@ -596,7 +596,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
>**errp)
> pfl->rom_mode = 1;
> sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
> 
>-pfl->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pflash_timer, pfl);
>+timer_init_ns(&pfl->timer, QEMU_CLOCK_VIRTUAL, pflash_timer, pfl);
> pfl->wcycle = 0;
> pfl->cmd = 0;
> pfl->status = 0;
>@@ -695,11 +695,18 @@ static Property pflash_cfi02_properties[] = {
> DEFINE_PROP_END_OF_LIST(),
> };
> 
>+static void pflash_cfi02_unrealize(DeviceState *dev, Error **errp)
>+{
>+pflash_t *pfl = CFI_PFLASH02(dev);
>+timer_del(&pfl->timer);
>+}
>+
> static void pflash_cfi02_class_init(ObjectClass *klass, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(klass);
> 
> dc->realize = pflash_cfi02_realize;
>+dc->unrealize = pflash_cfi02_unrealize;
> dc->props = pflash_cfi02_properties;
> set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> }
>-- 
>2.17.2 (Apple Git-113)
>

-- 
Wei Yang
Help you, Help me



Re: [Qemu-block] [Qemu-devel] [PATCH] hw/ide/ich: Compile ich.c only if CONFIG_PCI is also set

2019-02-20 Thread Wei Yang
On Tue, Feb 19, 2019 at 04:55:57PM +0100, Thomas Huth wrote:
>With the upcoming Kconfig-like build system, it will be easy to
>build also version of QEMU that only contain a single machine. Some

Sorry for my poor English.

What is also version?

>of these machines (like the ARM cubieboard) use CONFIG_AHCI for an
>AHCI sysbus device, but do not use CONFIG_PCI since they do not feature
>a PCI bus. In this case linking fails:
>
>../hw/ide/ich.o: In function `pci_ich9_ahci_realize':
>hw/ide/ich.c:124: undefined reference to `pci_allocate_irq'
>hw/ide/ich.c:126: undefined reference to `pci_register_bar'
>hw/ide/ich.c:128: undefined reference to `pci_register_bar'
>hw/ide/ich.c:131: undefined reference to `pci_add_capability'
>hw/ide/ich.c:147: undefined reference to `msi_init'
>../hw/ide/ich.o: In function `pci_ich9_uninit':
>hw/ide/ich.c:158: undefined reference to `msi_uninit'
>../hw/ide/ich.o:(.data.rel+0x50): undefined reference to `vmstate_pci_device'
>
>Thus we must only compile ich.c if CONFIG_PCI is also set.
>
>Signed-off-by: Thomas Huth 
>---
> hw/ide/Makefile.objs | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/hw/ide/Makefile.objs b/hw/ide/Makefile.objs
>index a142add..dfe53af 100644
>--- a/hw/ide/Makefile.objs
>+++ b/hw/ide/Makefile.objs
>@@ -9,6 +9,6 @@ common-obj-$(CONFIG_IDE_MMIO) += mmio.o
> common-obj-$(CONFIG_IDE_VIA) += via.o
> common-obj-$(CONFIG_MICRODRIVE) += microdrive.o
> common-obj-$(CONFIG_AHCI) += ahci.o
>-common-obj-$(CONFIG_AHCI) += ich.o
>+common-obj-$(call land,$(CONFIG_AHCI),$(CONFIG_PCI)) += ich.o
> common-obj-$(CONFIG_ALLWINNER_A10) += ahci-allwinner.o
> common-obj-$(CONFIG_IDE_SII3112) += sii3112.o
>-- 
>1.8.3.1
>

-- 
Wei Yang
Help you, Help me



Re: [Qemu-block] [PATCH] qemu-img: fix error reporting for -object

2019-02-20 Thread Kevin Wolf
Am 19.02.2019 um 12:46 hat Daniel P. Berrangé geschrieben:
> Error reporting for user_creatable_add_opts_foreach was changed so that
> it no longer called 'error_report_err' in:
> 
>   commit 7e1e0c11127bde81cff260fc6859690435c509d6
>   Author: Markus Armbruster 
>   Date:   Wed Oct 17 10:26:43 2018 +0200
> 
> qom: Clean up error reporting in user_creatable_add_opts_foreach()
> 
> Some callers were updated to pass in "&error_fatal" but all the ones in
> qemu-img were left passing NULL. As a result all errors went to
> /dev/null instead of being reported to the user.
> 
> Signed-off-by: Daniel P. Berrangé 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH] block/iscsi: Restrict Linux-specific code

2019-02-20 Thread Paolo Bonzini
On 20/02/19 01:05, Philippe Mathieu-Daudé wrote:
> Some Linux specific code is missing guards, leading to
> build failure on OSX:
> 
>   $ sudo brew install libiscsi
>   $ ./configure && make
>   [...]
> CC  block/iscsi.o
>   qemu/block/iscsi.c:338:24: error: 'iscsi_aiocb_info' defined but not used 
> [-Werror=unused-const-variable=]
>static const AIOCBInfo iscsi_aiocb_info = {
>   ^~~~
>   qemu/block/iscsi.c:168:1: error: 'iscsi_schedule_bh' defined but not used 
> [-Werror=unused-function]
>iscsi_schedule_bh(IscsiAIOCB *acb)
>^
>   cc1: all warnings being treated as errors
> 
> Add guards to restrict this code for Linux.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> There are various Linux specific code, we could split it out to
> another file such block/iscsi-lnx.o and guard with Makefile,
> but that's another patch.
> 
>  block/iscsi.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index ff473206e6..85daa9a481 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -145,6 +145,8 @@ static const unsigned iscsi_retry_times[] = {8, 32, 128, 
> 512, 2048, 8192, 32768}
>   * unallocated. */
>  #define ISCSI_CHECKALLOC_THRES 64
>  
> +#ifdef __linux__
> +
>  static void
>  iscsi_bh_cb(void *p)
>  {
> @@ -172,6 +174,8 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>  qemu_bh_schedule(acb->bh);
>  }
>  
> +#endif
> +
>  static void iscsi_co_generic_bh_cb(void *opaque)
>  {
>  struct IscsiTask *iTask = opaque;
> @@ -290,6 +294,8 @@ static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, 
> struct IscsiTask *iTask)
>  };
>  }
>  
> +#ifdef __linux__
> +
>  /* Called (via iscsi_service) with QemuMutex held. */
>  static void
>  iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
> *command_data,
> @@ -338,6 +344,7 @@ static const AIOCBInfo iscsi_aiocb_info = {
>  .cancel_async   = iscsi_aio_cancel,
>  };
>  
> +#endif
>  
>  static void iscsi_process_read(void *arg);
>  static void iscsi_process_write(void *arg);
> 

Queued, thanks.

Paolo



Re: [Qemu-block] [PATCH] hw/ide/ich: Compile ich.c only if CONFIG_PCI is also set

2019-02-20 Thread Paolo Bonzini
On 20/02/19 07:16, Thomas Huth wrote:
> By the way, what's the current status of the Kconfig series? Will there
> be another respin soon? ... the next softfreeze is not that far away ...

Waiting on the vhost series; I'll talk to Peter about merging it without
ARM and handling ARM later.

Paolo



[Qemu-block] [RFC PATCH v2] coroutines: generate wrapper code

2019-02-20 Thread Vladimir Sementsov-Ogievskiy
Hi all!

We have a very frequent pattern of creating coroutine from function
with several arguments:

  - create structure to pack parameters
  - create _entry function to call original function taking parameters
from struct
  - do different magic to handle completion: set ret to NOT_DONE or
EINPROGRESS, use separate bool for void functions
  - fill the struct and create coroutine from _entry function and this
struct as a parameter

Here is a template code + example how it can be used to drop a lot of
similar code.

TODO: make coroutine-wrapper template file to be header itself, or
  generate header from it

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

v2: - don't generate poll-loop wrappers
- drop jinja

 Makefile |  5 ++
 Makefile.objs|  2 +-
 include/block/block.h|  3 ++
 include/block/block_int.h|  8 +++
 block.c  | 76 +++-
 coroutine-wrapper|  2 +
 scripts/coroutine-wrapper.py | 98 
 7 files changed, 136 insertions(+), 58 deletions(-)
 create mode 100644 coroutine-wrapper
 create mode 100755 scripts/coroutine-wrapper.py

diff --git a/Makefile b/Makefile
index 1278a3eb52..8d19b06cf1 100644
--- a/Makefile
+++ b/Makefile
@@ -122,6 +122,8 @@ endif
 
 GENERATED_FILES += module_block.h
 
+GENERATED_FILES += block-gen.c
+
 TRACE_HEADERS = trace-root.h $(trace-events-subdirs:%=%/trace.h)
 TRACE_SOURCES = trace-root.c $(trace-events-subdirs:%=%/trace.c)
 TRACE_DTRACE =
@@ -138,6 +140,9 @@ GENERATED_FILES += $(TRACE_SOURCES)
 GENERATED_FILES += $(BUILD_DIR)/trace-events-all
 GENERATED_FILES += .git-submodule-status
 
+block-gen.c: coroutine-wrapper $(SRC_PATH)/scripts/coroutine-wrapper.py
+   $(call quiet-command, python $(SRC_PATH)/scripts/coroutine-wrapper.py < 
$< > $@,"GEN","$(TARGET_DIR)$@")
+
 trace-group-name = $(shell dirname $1 | sed -e 's/[^a-zA-Z0-9]/_/g')
 
 tracetool-y = $(SRC_PATH)/scripts/tracetool.py
diff --git a/Makefile.objs b/Makefile.objs
index 67a054b08a..16159d3e0f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,7 +23,7 @@ slirp-obj-$(CONFIG_SLIRP) = slirp/
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y += nbd/
-block-obj-y += block.o blockjob.o job.o
+block-obj-y += block.o blockjob.o job.o block-gen.o
 block-obj-y += block/ scsi/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
diff --git a/include/block/block.h b/include/block/block.h
index 57233cf2c0..61b2ca6fe5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -371,6 +371,8 @@ typedef enum {
 } BdrvCheckMode;
 
 int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix);
 
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
@@ -399,6 +401,7 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
 void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f605622216..9af6507be7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1175,4 +1175,12 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, 
uint64_t src_offset,
 
 int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
+
+Coroutine *bdrv_co_invalidate_cache__create_co(bool *_in_progress,
+   BlockDriverState *bs,
+   Error **errp);
+Coroutine *bdrv_co_check__create_co(bool *_in_progress, int *_ret,
+BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix);
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index b67d9b7b65..94999bc0fc 100644
--- a/block.c
+++ b/block.c
@@ -3706,8 +3706,8 @@ static void bdrv_delete(BlockDriverState *bs)
  * free of errors) or -errno when an internal error occurred. The results of 
the
  * check are stored in res.
  */
-static int coroutine_fn bdrv_co_check(BlockDriverState *bs,
-  BdrvCheckResult *res, BdrvCheckMode fix)
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix)
 {
 if (bs->drv == NULL) {
 return -ENOMEDIUM;
@@ -3720,44 +3720,25 @@ static int coroutine_fn bdrv_co_check(BlockDriverState 
*bs,
 return bs->drv->bdrv_co_check(bs, res, fix);
 }
 
-typedef struct CheckCo {
-BlockDriverState *bs;
- 

Re: [Qemu-block] [PATCH RFC 1/1] Stream block job involves copy-on-read filter

2019-02-20 Thread Andrey Shinkevich


On 18/02/2019 13:08, Vladimir Sementsov-Ogievskiy wrote:
> 14.02.2019 16:43, Andrey Shinkevich wrote:
>>
>>
>> On 12/02/2019 14:35, Alberto Garcia wrote:
>>> On Mon 11 Feb 2019 05:58:05 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>> The problem is in the concept of "base" node. The code written in
>> manner that base is not changed during block job. However, job don't
>> own base and there is no guarantee that it will not change during
>> the job.
>
> But if that's the case then we have a problem already, because 'base'
> is a member of StreamBlockJob and is used in the existing
> stream_run() code.

 I think it should be possible to reproduce, using block-commit (which
 already has filter) with block-stream in parallel, we'll try.
>>>
>>> It's not possible to run block-stream and block-commit in parallel on
>>> the same nodes. See iotest 030 for a few examples.
>>>
>>> So unless there's a bug it should be safe.
>>>
> So if there's a way to make 'base' disappear during the job (how?)
> then we could protect it with block_job_add_bdrv().

 I'm not sure this is correct. What is the reason for stream to own
 base? It's not really interested in it.
>>>
>>> stream does not need to write or modify base, but it does need to keep a
>>> reference to it in order to now where to stop copying data.
>>>
>>> As I said earlier base is a member of StreamBlockJob, so it should not
>>> disappear during the job.
>>>
>>> Berto
>>>
>>
>> No doubt that a reference to the base node is needed as a limit for the
>> COR work. The base is still in the game. Actually, we encounter an issue
>> when BlockDriverState of the COR-filter comes into play instead of the
>> base. It is inherited from a parallel job. Based on the case
>> test_stream_parallel from the qemu-iotests/030, it works like this:
>> 1. Job#1 is started and inserts a COR-filter above the top node.
>> 2. Context switch.
>> 3. Job#2 is started and inherits the COR-filter as the base.
>> 4. Context switch.
>> 5. Job#1 comes to the end and removes its COR-filter referenced by
>>   job#2.
>> 6. Context switch.
>> 7. Job#2 comes to the end and uses the deleted COR-filter node.
>> 8. We are in trouble.
>>
>> Or another scenario:
>> 1-4. As above
>> 5. Job#2 comes to the end first and keeps the COR-filter as the
>>   backing node.
>> 6. The assert() fails as the referenced COR-filter was not deleted
>>   on exit.
>> 7. The game is over with a poor score.
>>
>> If we just keep the intermediate bottom node instead of the base, we
>> will have a similar issue. On the job exit, we change the backing file.
>> If we call the backing_bs(bottom_node_bs), we will keep the COR-filter
>> node instead, whereas it has not been removed jet by the unfinished
>> job#1 (see the second scenario above).
> 
> Honestly, don't see any problem with it.
> 
>>
>> If we include the base node into the job node list with some protecting
>> flags, that's block_job_add_bdrv(&s->common, "base", base, ..), we can
>> end up with a failure of the bdrv_check_update_perm() called by the
>> bdrv_root_attach_child() because the previous job#1 has the root
>> reference to the same node with other permissions. So, the next job will
>> fail to start and the test cases won't pass.
> 
> You mean test 30 # testcase about parallel stream jobs, which starts streams,
> sharing one node as base for one and top for another. And if we lock base in
> stream job the test fails. I think, we just need to update the test, by 
> inserting
> additional empty qcow2 nodes below shared ones, and make new inserted nodes
> to be top nodes of stream jobs, so there no more node-sharing between jobs.
> 
>>
>> It we set the following flags, there will be no failure:
>> block_job_add_bdrv(&s->common, "base", base, 0, BLK_PERM_CONSISTENT_READ
>> | BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD, &error_abort);
>> But what margin will we gain? We will have the same issues as above.
> 
> We meed flags ..., BLK_PERM_GRAPH_MOD, BLK_PERM_ALL & ~BLK_PERM_GRAPH_MOD, ...
> for base node.
> 
>>
>> In many other cases, when the filter has not been identified, we can
>> get a broken chain while calling to the backing_bs(bs=filter) as the
>> backing node of the filter is the zero pointer.
>>
>> That's why I implemented the skip_filter() function in this series as
>> a some sort of solution or workaround. May we proceed with that?
>> Is there any better idea?
>>
> 
> I think now, that best way is to lock base node. As it is the simplest one. 
> And it
> corresponds to current code, which actually keeps illegal pointer to base 
> node,
> without any access rights.
> 
> 

Well, I am about to implement the next series version with the base
BlockDriverState included into the job node list and will insert extra
images between those shared by the parallel jobs in the test case
TestParallelOps of #030.

-- 
With the best regards,
Andrey Shinkevich


Re: [Qemu-block] [RFC PATCH 2/2] block/dirty-bitmap: implement inconsistent bit

2019-02-20 Thread Vladimir Sementsov-Ogievskiy
20.02.2019 1:00, John Snow wrote:
> 
> 
> On 2/18/19 1:13 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 14.02.2019 2:36, John Snow wrote:
>>> Signed-off-by: John Snow 
>>> ---
>>>block/dirty-bitmap.c | 15 +
>>>block/qcow2-bitmap.c | 42 ++-
>>>blockdev.c   | 43 
>>>include/block/dirty-bitmap.h |  1 +
>>>4 files changed, 81 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>>> index b1879d7fbd..06d8ee0d79 100644
>>> --- a/block/dirty-bitmap.c
>>> +++ b/block/dirty-bitmap.c
>>> @@ -589,6 +589,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
>>> HBitmap **out)
>>>*out = backup;
>>>}
>>>bdrv_dirty_bitmap_unlock(bitmap);
>>> +bdrv_dirty_bitmap_set_inconsistent(bitmap, false);
>>>}
>>>
>>>void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup)
>>> @@ -776,6 +777,13 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap 
>>> *bitmap,
>>>return hbitmap_next_dirty_area(bitmap->bitmap, offset, bytes);
>>>}
>>>
>>> +void bdrv_dirty_bitmap_add_inconsistent_hint(Error **errp)
>>> +{
>>> +error_append_hint(errp, "Try block-dirty-bitmap-clear to mark this "
>>> +  "bitmap consistent again, or 
>>> block-dirty-bitmap-remove "
>>> +  "to delete it.");
>>
>> bitmaps created by libvirt (or somebody) are related to some checkpoint. And 
>> their name is
>> probably (Eric?) related to this checkpoint too. So, clear will never make 
>> them consistent..
>> Only clear :)
>>
>> So, I don't like idea of clearing in-use bitmaps.
>>
>>> +}
>>> +
>>>void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const 
>>> BdrvDirtyBitmap *src,
>>> HBitmap **backup, Error **errp)
>>>{
>>> @@ -798,6 +806,13 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, 
>>> const BdrvDirtyBitmap *src,
>>>goto out;
>>>}
>>>
>>> +if (bdrv_dirty_bitmap_inconsistent(dest)) {
>>> +error_setg(errp, "Bitmap '%s' is inconsistent and cannot be used 
>>> as"
>>> +   " a merge target", dest->name);
>>> +bdrv_dirty_bitmap_add_inconsistent_hint(errp);
>>> +goto out;
>>> +}
>>
>> I think, we need common predicate which will combine busy and inconsistent, 
>> as almost in all cases
>> we need both to be false to do any operation.
>>
>> bdrv_dirty_bitmap_usable() ? :)
>>
>> and pass errp to this helper.
>>
>> Actually, we already need it, to fill errp, which we almost always fill in 
>> the same manner.
>>
>>> +
>>>if (!hbitmap_can_merge(dest->bitmap, src->bitmap)) {
>>>error_setg(errp, "Bitmaps are incompatible and can't be merged");
>>>goto out;
>>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>>> index 3ee524da4b..9bd8bc417f 100644
>>> --- a/block/qcow2-bitmap.c
>>> +++ b/block/qcow2-bitmap.c
>>
>> hmm, I also think we should report our deprecated status as locked for 
>> inconsistent bitmaps..
>>
>>
> 
> Though we're trying to deprecate the field altogether, I *could* add a
> special status flag that makes it unambiguous. This will catch the
> attention of anyone using the old API.

I agree.


-- 
Best regards,
Vladimir