Re: [PATCH] vhost-user: Fix backends without multiqueue support

2021-07-08 Thread Raphael Norwitz
On Mon, Jul 05, 2021 at 07:14:29PM +0200, Kevin Wolf wrote:
> dev->max_queues was never initialised for backends that don't support
> VHOST_USER_PROTOCOL_F_MQ, so it would use 0 as the maximum number of
> queues to check against and consequently fail for any such backend.
> 
> Set it to 1 if the backend doesn't have multiqueue support.
> 
> Fixes: c90bd505a3e8210c23d69fecab9ee6f56ec4a161
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/virtio/vhost-user.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 1ac4a2ebec..29ea2b4fce 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1913,7 +1913,10 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque,
>  if (err < 0) {
>  return -EPROTO;
>  }
> +} else {
> +dev->max_queues = 1;
>  }
> +
>  if (dev->num_queues && dev->max_queues < dev->num_queues) {
>  error_setg(errp, "The maximum number of queues supported by the "
> "backend is %" PRIu64, dev->max_queues);
> -- 
> 2.31.1
> 


Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-08 Thread Peter Lieven



> Am 08.07.2021 um 14:18 schrieb Kevin Wolf :
> 
> Am 07.07.2021 um 20:13 hat Peter Lieven geschrieben:
>>> Am 06.07.2021 um 17:25 schrieb Kevin Wolf :
>>> Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
 I will have a decent look after my vacation.
>>> 
>>> Sounds good, thanks. Enjoy your vacation!
>> 
>> As I had to fire up my laptop to look into another issue anyway, I
>> have sent two patches for updating MAINTAINERS and to fix the int vs.
>> bool mix for task->complete.
> 
> I think you need to reevaluate your definition of vacation. ;-)

Lets talk about this when the kids are grown up. Sometimes sending patches can 
be quite relaxing :-)

> 
> But thanks anyway.
> 
>> As Paolos fix (5f50be9b5) is relatively new and there are maybe other
>> non obvious problems when removing the BH indirection and we are close
>> to soft freeze I would leave the BH removal change for 6.2.
> 
> Sure, code cleanups aren't urgent.

Isn’t the indirection also a slight performance drop?

Peter






Re: [PATCH 3/2] qemu-img: Improve error for rebase without backing format

2021-07-08 Thread Kevin Wolf
Am 08.07.2021 um 17:52 hat Eric Blake geschrieben:
> When removeing support for qemu-img being able to create backing
> chains without embedded backing formats, we caused a poor error
> message as caught by iotest 114.  Improve the situation to inform the
> user what went wrong.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Eric Blake 

Thanks, applied to the block branch.

Kevin




[PATCH 3/2] qemu-img: Improve error for rebase without backing format

2021-07-08 Thread Eric Blake
When removeing support for qemu-img being able to create backing
chains without embedded backing formats, we caused a poor error
message as caught by iotest 114.  Improve the situation to inform the
user what went wrong.

Suggested-by: Kevin Wolf 
Signed-off-by: Eric Blake 
---
 qemu-img.c | 3 +++
 tests/qemu-iotests/114.out | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 7c0e73882dd4..b017734c255a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3786,6 +3786,9 @@ static int img_rebase(int argc, char **argv)
 if (ret == -ENOSPC) {
 error_report("Could not change the backing file to '%s': No "
  "space left in the file header", out_baseimg);
+} else if (ret == -EINVAL && out_baseimg && !out_basefmt) {
+error_report("Could not change the backing file to '%s': backing "
+ "format must be specified", out_baseimg);
 } else if (ret < 0) {
 error_report("Could not change the backing file to '%s': %s",
 out_baseimg, strerror(-ret));
diff --git a/tests/qemu-iotests/114.out b/tests/qemu-iotests/114.out
index 172454401257..016e9ce3ecfb 100644
--- a/tests/qemu-iotests/114.out
+++ b/tests/qemu-iotests/114.out
@@ -14,7 +14,7 @@ qemu-io: can't open device TEST_DIR/t.qcow2: Could not open 
backing file: Unknow
 no file open, try 'help open'
 read 4096/4096 bytes at offset 0
 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-qemu-img: Could not change the backing file to 
'/home/eblake/qemu/build/tests/qemu-iotests/scratch/t.qcow2.base': Invalid 
argument
+qemu-img: Could not change the backing file to 
'/home/eblake/qemu/build/tests/qemu-iotests/scratch/t.qcow2.base': backing 
format must be specified
 read 4096/4096 bytes at offset 0
 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.31.1




Re: [PATCH v6 6/6] block: Make blockdev-reopen stable API

2021-07-08 Thread Eric Blake
On Thu, Jul 08, 2021 at 01:47:09PM +0200, Kevin Wolf wrote:
> From: Alberto Garcia 
> 
> This patch drops the 'x-' prefix from x-blockdev-reopen.
> 
> Signed-off-by: Alberto Garcia 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---

> +++ b/qapi/block-core.json
> @@ -4219,7 +4219,7 @@
>  { 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>  
>  ##
> -# @x-blockdev-reopen:
> +# @blockdev-reopen:
>  #
>  # Reopens one or more block devices using the given set of options.
>  # Any option not specified will be reset to its default value regardless
> @@ -4257,9 +4257,9 @@
>  # image does not have a default backing file name as part of its
>  # metadata.
>  #
> -# Since: 4.0
> +# Since: 6.0

6.1 now

And yay! We are finally getting here!

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 2/2] qemu-img: Require -F with -b backing image

2021-07-08 Thread Eric Blake
On Thu, Jul 08, 2021 at 03:00:36PM +0200, Kevin Wolf wrote:
> Am 03.05.2021 um 23:36 hat Eric Blake geschrieben:
> > @@ -17,7 +14,7 @@ qemu-io: can't open device TEST_DIR/t.qcow2: Could not 
> > open backing file: Unknow
> >  no file open, try 'help open'
> >  read 4096/4096 bytes at offset 0
> >  4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > -qemu-img: warning: Deprecated use of backing file without explicit backing 
> > format, use of this image requires potentially unsafe format probing
> > +qemu-img: Could not change the backing file to 
> > '/home/eblake/qemu/build/tests/qemu-iotests/scratch/t.qcow2.base': Invalid 
> > argument
> >  read 4096/4096 bytes at offset 0
> >  4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> 
> This is not exactly an improvement for the error message. Maybe worth a
> follow-up patch?

Sure, will do.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v6 1/6] qcow2: Fix dangling pointer after reopen for 'file'

2021-07-08 Thread Vladimir Sementsov-Ogievskiy

08.07.2021 14:47, Kevin Wolf wrote:

Without an external data file, s->data_file is a second pointer with the
same value as bs->file. When changing bs->file to a different BdrvChild
and freeing the old BdrvChild, s->data_file must also be updated,
otherwise it points to freed memory and causes crashes.

This problem was caught by iotests case 245.

Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820
Signed-off-by: Kevin Wolf


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v6 1/6] qcow2: Fix dangling pointer after reopen for 'file'

2021-07-08 Thread Eric Blake
On Thu, Jul 08, 2021 at 01:47:04PM +0200, Kevin Wolf wrote:
> Without an external data file, s->data_file is a second pointer with the
> same value as bs->file. When changing bs->file to a different BdrvChild
> and freeing the old BdrvChild, s->data_file must also be updated,
> otherwise it points to freed memory and causes crashes.
> 
> This problem was caught by iotests case 245.
> 
> Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820
> Signed-off-by: Kevin Wolf 
> ---
>  block/qcow2.c | 29 +
>  1 file changed, 29 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




[PULL 5/5] block/io: Merge discard request alignments

2021-07-08 Thread Stefan Hajnoczi
From: Akihiko Odaki 

Signed-off-by: Akihiko Odaki 
Message-id: 20210705130458.97642-3-akihiko.od...@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
 block/io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/io.c b/block/io.c
index cf177a9d2d..e0a689c584 100644
--- a/block/io.c
+++ b/block/io.c
@@ -125,6 +125,8 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool 
poll)
 
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 {
+dst->pdiscard_alignment = MAX(dst->pdiscard_alignment,
+  src->pdiscard_alignment);
 dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
 dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
 dst->max_hw_transfer = MIN_NON_ZERO(dst->max_hw_transfer,
-- 
2.31.1



[PULL 3/5] block/file-posix: Optimize for macOS

2021-07-08 Thread Stefan Hajnoczi
From: Akihiko Odaki 

This commit introduces "punch hole" operation and optimizes transfer
block size for macOS.

Thanks to Konstantin Nazarov for detailed analysis of a flaw in an
old version of this change:
https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667

Signed-off-by: Akihiko Odaki 
Message-id: 20210705130458.97642-1-akihiko.od...@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
 block/file-posix.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index a26eab0ac3..cb9bffe047 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -46,6 +46,7 @@
 #if defined(HAVE_HOST_BLOCK_DEVICE)
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1254,6 +1255,15 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 return;
 }
 
+#if defined(__APPLE__) && (__MACH__)
+struct statfs buf;
+
+if (!fstatfs(s->fd, )) {
+bs->bl.opt_transfer = buf.f_iosize;
+bs->bl.pdiscard_alignment = buf.f_bsize;
+}
+#endif
+
 if (bs->sg || S_ISBLK(st.st_mode)) {
 int ret = hdev_get_max_hw_transfer(s->fd, );
 
@@ -1591,6 +1601,7 @@ out:
 }
 }
 
+#if defined(CONFIG_FALLOCATE) || defined(BLKZEROOUT) || defined(BLKDISCARD)
 static int translate_err(int err)
 {
 if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
@@ -1599,6 +1610,7 @@ static int translate_err(int err)
 }
 return err;
 }
+#endif
 
 #ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
@@ -1811,16 +1823,27 @@ static int handle_aiocb_discard(void *opaque)
 }
 } while (errno == EINTR);
 
-ret = -errno;
+ret = translate_err(-errno);
 #endif
 } else {
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
aiocb->aio_offset, aiocb->aio_nbytes);
+ret = translate_err(-errno);
+#elif defined(__APPLE__) && (__MACH__)
+fpunchhole_t fpunchhole;
+fpunchhole.fp_flags = 0;
+fpunchhole.reserved = 0;
+fpunchhole.fp_offset = aiocb->aio_offset;
+fpunchhole.fp_length = aiocb->aio_nbytes;
+if (fcntl(s->fd, F_PUNCHHOLE, ) == -1) {
+ret = errno == ENODEV ? -ENOTSUP : -errno;
+} else {
+ret = 0;
+}
 #endif
 }
 
-ret = translate_err(ret);
 if (ret == -ENOTSUP) {
 s->has_discard = false;
 }
-- 
2.31.1



[PULL 4/5] block: Add backend_defaults property

2021-07-08 Thread Stefan Hajnoczi
From: Akihiko Odaki 

backend_defaults property allow users to control if default block
properties should be decided with backend information.

If it is off, any backend information will be discarded, which is
suitable if you plan to perform live migration to a different disk backend.

If it is on, a block device may utilize backend information more
aggressively.

By default, it is auto, which uses backend information for block
sizes and ignores the others, which is consistent with the older
versions.

Signed-off-by: Akihiko Odaki 
Message-id: 20210705130458.97642-2-akihiko.od...@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/block/block.h   |  3 +++
 hw/block/block.c   | 42 ++
 tests/qemu-iotests/172.out | 38 ++
 3 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index c172cbe65f..5902c0440a 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -19,6 +19,7 @@
 
 typedef struct BlockConf {
 BlockBackend *blk;
+OnOffAuto backend_defaults;
 uint32_t physical_block_size;
 uint32_t logical_block_size;
 uint32_t min_io_size;
@@ -48,6 +49,8 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 }
 
 #define DEFINE_BLOCK_PROPERTIES_BASE(_state, _conf) \
+DEFINE_PROP_ON_OFF_AUTO("backend_defaults", _state, \
+_conf.backend_defaults, ON_OFF_AUTO_AUTO),  \
 DEFINE_PROP_BLOCKSIZE("logical_block_size", _state, \
   _conf.logical_block_size),\
 DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,\
diff --git a/hw/block/block.c b/hw/block/block.c
index 1e34573da7..d47ebf005a 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -65,24 +65,58 @@ bool blkconf_blocksizes(BlockConf *conf, Error **errp)
 {
 BlockBackend *blk = conf->blk;
 BlockSizes blocksizes;
-int backend_ret;
+BlockDriverState *bs;
+bool use_blocksizes;
+bool use_bs;
+
+switch (conf->backend_defaults) {
+case ON_OFF_AUTO_AUTO:
+use_blocksizes = !blk_probe_blocksizes(blk, );
+use_bs = false;
+break;
+
+case ON_OFF_AUTO_ON:
+use_blocksizes = !blk_probe_blocksizes(blk, );
+bs = blk_bs(blk);
+use_bs = bs;
+break;
+
+case ON_OFF_AUTO_OFF:
+use_blocksizes = false;
+use_bs = false;
+break;
+
+default:
+abort();
+}
 
-backend_ret = blk_probe_blocksizes(blk, );
 /* fill in detected values if they are not defined via qemu command line */
 if (!conf->physical_block_size) {
-if (!backend_ret) {
+if (use_blocksizes) {
conf->physical_block_size = blocksizes.phys;
 } else {
 conf->physical_block_size = BDRV_SECTOR_SIZE;
 }
 }
 if (!conf->logical_block_size) {
-if (!backend_ret) {
+if (use_blocksizes) {
 conf->logical_block_size = blocksizes.log;
 } else {
 conf->logical_block_size = BDRV_SECTOR_SIZE;
 }
 }
+if (use_bs) {
+if (!conf->opt_io_size) {
+conf->opt_io_size = bs->bl.opt_transfer;
+}
+if (conf->discard_granularity == -1) {
+if (bs->bl.pdiscard_alignment) {
+conf->discard_granularity = bs->bl.pdiscard_alignment;
+} else if (bs->bl.request_alignment != 1) {
+conf->discard_granularity = bs->bl.request_alignment;
+}
+}
+}
 
 if (conf->logical_block_size > conf->physical_block_size) {
 error_setg(errp,
diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
index d53f61d0de..4cf4d536b4 100644
--- a/tests/qemu-iotests/172.out
+++ b/tests/qemu-iotests/172.out
@@ -21,6 +21,7 @@ Testing:
   dev: floppy, id ""
 unit = 0 (0x0)
 drive = "floppy0"
+backend_defaults = "auto"
 logical_block_size = 512 (512 B)
 physical_block_size = 512 (512 B)
 min_io_size = 0 (0 B)
@@ -48,6 +49,7 @@ Testing: -fda TEST_DIR/t.qcow2
   dev: floppy, id ""
 unit = 0 (0x0)
 drive = "floppy0"
+backend_defaults = "auto"
 logical_block_size = 512 (512 B)
 physical_block_size = 512 (512 B)
 min_io_size = 0 (0 B)
@@ -85,6 +87,7 @@ Testing: -fdb TEST_DIR/t.qcow2
   dev: floppy, id ""
 unit = 1 (0x1)
 drive = "floppy1"
+backend_defaults = "auto"
 logical_block_size = 512 (512 B)
 physical_block_size = 512 (512 B)
 min_io_size = 0 (0 B)
@@ -96,6 +99,7 @@ Testing: -fdb TEST_DIR/t.qcow2
   dev: floppy, id ""
   

[PULL 1/5] util/async: add a human-readable name to BHs for debugging

2021-07-08 Thread Stefan Hajnoczi
It can be difficult to debug issues with BHs in production environments.
Although BHs can usually be identified by looking up their ->cb()
function pointer, this requires debug information for the program. It is
also not possible to print human-readable diagnostics about BHs because
they have no identifier.

This patch adds a name to each BH. The name is not unique per instance
but differentiates between cb() functions, which is usually enough. It's
done by changing aio_bh_new() and friends to macros that stringify cb.

The next patch will use the name field when reporting leaked BHs.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20210414200247.917496-2-stefa...@redhat.com>
---
 include/block/aio.h| 31 ---
 include/qemu/main-loop.h   |  4 +++-
 tests/unit/ptimer-test-stubs.c |  2 +-
 util/async.c   |  9 +++--
 util/main-loop.c   |  4 ++--
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 10fcae1515..807edce9b5 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -291,20 +291,45 @@ void aio_context_acquire(AioContext *ctx);
 /* Relinquish ownership of the AioContext. */
 void aio_context_release(AioContext *ctx);
 
+/**
+ * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
+ * run only once and as soon as possible.
+ *
+ * @name: A human-readable identifier for debugging purposes.
+ */
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void 
*opaque,
+  const char *name);
+
 /**
  * aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
  * only once and as soon as possible.
+ *
+ * A convenience wrapper for aio_bh_schedule_oneshot_full() that uses cb as the
+ * name string.
  */
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+#define aio_bh_schedule_oneshot(ctx, cb, opaque) \
+aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)))
 
 /**
- * aio_bh_new: Allocate a new bottom half structure.
+ * aio_bh_new_full: Allocate a new bottom half structure.
  *
  * Bottom halves are lightweight callbacks whose invocation is guaranteed
  * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
  * is opaque and must be allocated prior to its use.
+ *
+ * @name: A human-readable identifier for debugging purposes.
  */
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+const char *name);
+
+/**
+ * aio_bh_new: Allocate a new bottom half structure
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new(ctx, cb, opaque) \
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 98aef5647c..8dbc6fcb89 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -294,7 +294,9 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
+#define qemu_bh_new(cb, opaque) \
+qemu_bh_new_full((cb), (opaque), (stringify(cb)))
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index 7f801a4d09..2a3ef58799 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -108,7 +108,7 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int 
attr_mask)
 return deadline;
 }
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
 {
 QEMUBH *bh = g_new(QEMUBH, 1);
 
diff --git a/util/async.c b/util/async.c
index 5d9b7cc1eb..9a668996b8 100644
--- a/util/async.c
+++ b/util/async.c
@@ -57,6 +57,7 @@ enum {
 
 struct QEMUBH {
 AioContext *ctx;
+const char *name;
 QEMUBHFunc *cb;
 void *opaque;
 QSLIST_ENTRY(QEMUBH) next;
@@ -107,7 +108,8 @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
 return bh;
 }
 
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
+  void *opaque, const char *name)
 {
 QEMUBH *bh;
 bh = g_new(QEMUBH, 1);
@@ -115,11 +117,13 @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc 
*cb, void *opaque)
 .ctx = ctx,
 .cb = cb,
 .opaque = opaque,
+.name = name,
 };
 aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
 }
 
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+QEMUBH 

[PULL 2/5] util/async: print leaked BH name when AioContext finalizes

2021-07-08 Thread Stefan Hajnoczi
BHs must be deleted before the AioContext is finalized. If not, it's a
bug and probably indicates that some part of the program still expects
the BH to run in the future. That can lead to memory leaks, inconsistent
state, or just hangs.

Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize()
is difficult to debug because the assertion failure contains no
information about the BH!

Use the QEMUBH name field added in the previous patch to show a useful
error when a leaked BH is detected.

Suggested-by: Eric Ernst 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20210414200247.917496-3-stefa...@redhat.com>
---
 util/async.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index 9a668996b8..9a41591319 100644
--- a/util/async.c
+++ b/util/async.c
@@ -344,8 +344,20 @@ aio_ctx_finalize(GSource *source)
 assert(QSIMPLEQ_EMPTY(>bh_slice_list));
 
 while ((bh = aio_bh_dequeue(>bh_list, ))) {
-/* qemu_bh_delete() must have been called on BHs in this AioContext */
-assert(flags & BH_DELETED);
+/*
+ * qemu_bh_delete() must have been called on BHs in this AioContext. In
+ * many cases memory leaks, hangs, or inconsistent state occur when a
+ * BH is leaked because something still expects it to run.
+ *
+ * If you hit this, fix the lifecycle of the BH so that
+ * qemu_bh_delete() and any associated cleanup is called before the
+ * AioContext is finalized.
+ */
+if (unlikely(!(flags & BH_DELETED))) {
+fprintf(stderr, "%s: BH '%s' leaked, aborting...\n",
+__func__, bh->name);
+abort();
+}
 
 g_free(bh);
 }
-- 
2.31.1



[PULL 0/5] Block patches

2021-07-08 Thread Stefan Hajnoczi
The following changes since commit 711c0418c8c1ce3a24346f058b001c4c5a2f0f81:

  Merge remote-tracking branch 'remotes/philmd/tags/mips-20210702' into staging 
(2021-07-04 14:04:12 +0100)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 9f460c64e13897117f35ffb61f6f5e0102cabc70:

  block/io: Merge discard request alignments (2021-07-06 14:28:55 +0100)


Pull request



Akihiko Odaki (3):
  block/file-posix: Optimize for macOS
  block: Add backend_defaults property
  block/io: Merge discard request alignments

Stefan Hajnoczi (2):
  util/async: add a human-readable name to BHs for debugging
  util/async: print leaked BH name when AioContext finalizes

 include/block/aio.h| 31 ++---
 include/hw/block/block.h   |  3 +++
 include/qemu/main-loop.h   |  4 +++-
 block/file-posix.c | 27 --
 block/io.c |  2 ++
 hw/block/block.c   | 42 ++
 tests/unit/ptimer-test-stubs.c |  2 +-
 util/async.c   | 25 
 util/main-loop.c   |  4 ++--
 tests/qemu-iotests/172.out | 38 ++
 10 files changed, 161 insertions(+), 17 deletions(-)

-- 
2.31.1



Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote:
> This is a continuation on the work to reduce (and possibly get rid of) the 
> usage of AioContext lock, by introducing smaller granularity locks to keep 
> the thread safety.
> 
> This series aims to:
> 1) remove the aiocontext lock and substitute it with the already existing
>global job_mutex
> 2) fix what it looks like to be an oversight when moving the blockjob.c logic
>into the more generic job.c: job_mutex was introduced especially to
>protect job->busy flag, but it seems that it was not used in successive
>patches, because there are multiple code sections that directly
>access the field without any locking.
> 3) use job_mutex instead of the aiocontext_lock
> 4) extend the reach of the job_mutex to protect all shared fields
>that the job structure has.

Can you explain the big picture:

1. What are the rules for JobDrivers? Imagine you are implementing a new
   JobDriver. What do you need to know in order to write correct code?

2. What are the rules for monitor? The main pattern is looking up a job,
   invoking a job API on it, and then calling job_unlock().

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 0/2] Remove deprecated qemu-img backing file without format

2021-07-08 Thread Kevin Wolf
Am 07.07.2021 um 23:17 hat Eric Blake geschrieben:
> On Mon, May 03, 2021 at 04:35:58PM -0500, Eric Blake wrote:
> > We've gone enough release cycles without noticeable pushback on our
> > intentions, so time to make it harder to create images that can form a
> > security hole due to a need for format probing rather than an explicit
> > format.
> > 
> > Eric Blake (2):
> >   qcow2: Prohibit backing file changes in 'qemu-img amend'
> >   qemu-img: Require -F with -b backing image
> 
> Ping.

Thanks, applied to the block branch.

For some reason, the CCs were missing on the cover letter. Please make
sure to CC me (and qemu-block) for the whole series if you want it to go
through my tree.

Kevin




Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex

2021-07-08 Thread Stefan Hajnoczi
On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
> On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > What is very clear from this patch is that it
> > > is strictly related to the brdv_* and lower level calls, because
> > > they also internally check or even use the aiocontext lock.
> > > Therefore, in order to make it work, I temporarly added some
> > > aiocontext_acquire/release pair around the function that
> > > still assert for them or assume they are hold and temporarly
> > > unlock (unlock() - lock()).
> > 
> > Sounds like the issue is that this patch series assumes AioContext locks
> > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > not the case yet, so you had to then add those aio_context_lock() calls
> > back in elsewhere. This approach introduces unnecessary risk. I think we
> > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > the AioContext lock before applying this series.
> 
> In general I'm in favor of pushing the lock further down into smaller and
> smaller critical sections; it's a good approach to make further audits
> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> reviewed Emanuele's patches to see if this is what he's doing where he's
> adding the acquire/release calls, but that's my understanding of both his
> cover letter and your reply.

The problem is the unnecessary risk. We know what the goal is for
blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
block jobs help solve the final issues with blk_*()/bdrv_*()?

If yes, then it's a risk worth taking. If no, then spending time
developing interim code, reviewing those patches, and risking breakage
doesn't seem worth it. I'd rather wait for blk_*()/bdrv_*() to be fully
complete and then see patches that delete aio_context_acquire() in most
places or add locks in the remaining places where the caller was relying
on the AioContext lock.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 2/2] qemu-img: Require -F with -b backing image

2021-07-08 Thread Kevin Wolf
Am 03.05.2021 um 23:36 hat Eric Blake geschrieben:
> @@ -17,7 +14,7 @@ qemu-io: can't open device TEST_DIR/t.qcow2: Could not open 
> backing file: Unknow
>  no file open, try 'help open'
>  read 4096/4096 bytes at offset 0
>  4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -qemu-img: warning: Deprecated use of backing file without explicit backing 
> format, use of this image requires potentially unsafe format probing
> +qemu-img: Could not change the backing file to 
> '/home/eblake/qemu/build/tests/qemu-iotests/scratch/t.qcow2.base': Invalid 
> argument
>  read 4096/4096 bytes at offset 0
>  4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

This is not exactly an improvement for the error message. Maybe worth a
follow-up patch?

Kevin




Re: [RFC PATCH 5/6] job: use global job_mutex to protect struct Job

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 06:58:12PM +0200, Emanuele Giuseppe Esposito wrote:
> This lock is going to replace most of the AioContext locks
> in the job and blockjob, so that a Job can run in an arbitrary
> AioContext.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  include/block/blockjob_int.h |   1 +
>  include/qemu/job.h   |   2 +
>  block/backup.c   |   4 +
>  block/mirror.c   |  11 +-
>  blockdev.c   |  62 
>  blockjob.c   |  67 +++--
>  job-qmp.c|  55 +++
>  job.c| 284 +++
>  qemu-img.c   |  15 +-
>  9 files changed, 350 insertions(+), 151 deletions(-)
> 
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 6633d83da2..8b91126506 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -53,6 +53,7 @@ struct BlockJobDriver {
>   */
>  void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
>  
> +/* Called with job mutex *not* held. */
>  void (*set_speed)(BlockJob *job, int64_t speed);
>  };
>  
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 4421d08d93..359f4e6b3a 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -49,6 +49,8 @@ typedef struct Job {
>  /**
>   * The type of this job.
>   * Set it in job_create and just read.
> + * All calls to the driver function must be not locked by job_mutex,
> + * to avoid deadlocks.
>   */
>  const JobDriver *driver;
>  
> diff --git a/block/backup.c b/block/backup.c
> index bd3614ce70..80ce956299 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -315,6 +315,10 @@ static void coroutine_fn backup_pause(Job *job)
>  }
>  }
>  
> +/*
> + * Called with job mutex *not* held (we don't want to call block_copy_kick
> + * with the lock held!)
> + */
>  static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
>  {
>  BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> diff --git a/block/mirror.c b/block/mirror.c
> index 49fffa..deefaa6a39 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1150,9 +1150,11 @@ static void mirror_complete(Job *job, Error **errp)
>  s->should_complete = true;
>  
>  /* If the job is paused, it will be re-entered when it is resumed */
> +job_lock();
>  if (!job_is_paused(job)) {
> -job_enter(job);
> +job_enter_locked(job);
>  }
> +job_unlock();
>  }
>  
>  static void coroutine_fn mirror_pause(Job *job)
> @@ -1171,10 +1173,13 @@ static bool mirror_drained_poll(BlockJob *job)
>   * from one of our own drain sections, to avoid a deadlock waiting for
>   * ourselves.
>   */
> -if (!job_is_paused(>common.job) && !job_is_cancelled(>common.job) 
> &&
> -!s->in_drain) {
> +job_lock();
> +if (!job_is_paused(>common.job) &&
> +!job_is_cancelled_locked(>common.job) && !s->in_drain) {
> +job_unlock();
>  return true;
>  }
> +job_unlock();
>  
>  return !!s->in_flight;
>  }
> diff --git a/blockdev.c b/blockdev.c
> index 8e2c15370e..9255aea6a2 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -150,9 +150,11 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>  AioContext *aio_context = job_get_aiocontext(>job);
>  aio_context_acquire(aio_context);
>  
> +job_lock();
>  job_cancel(>job, false);
>  
>  aio_context_release(aio_context);
> +job_unlock();

This looks strange. The way it's written suggests there is a reason why
job_unlock() has to be called after aio_context_release(). Can
job_unlock() be called immediately after job_cancel()?

>  }
>  }
>  
> @@ -3309,48 +3311,44 @@ out:
>  aio_context_release(aio_context);
>  }
>  
> -/* Get a block job using its ID and acquire its AioContext */
> -static BlockJob *find_block_job(const char *id, AioContext **aio_context,
> -Error **errp)
> +/* Get a block job using its ID and acquire its job_lock */

"its" suggests job_lock is per-Job. I suggest saying something like
"Returns with job_lock held on success" instead.


signature.asc
Description: PGP signature


Re: [PATCH] blockdev: fix drive-backup transaction endless drained section

2021-07-08 Thread Kevin Wolf
Am 07.07.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Forgotten thing :(
> 
> Kevin, could you please queue it in your block branch? For me not to
> bother Peter with one-patch pull request.

No problem, I've queued it now.

Kevin




Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-08 Thread Kevin Wolf
Am 07.07.2021 um 20:13 hat Peter Lieven geschrieben:
> Am 06.07.2021 um 17:25 schrieb Kevin Wolf :
> > Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
> >> I will have a decent look after my vacation.
> > 
> > Sounds good, thanks. Enjoy your vacation!
> 
> As I had to fire up my laptop to look into another issue anyway, I
> have sent two patches for updating MAINTAINERS and to fix the int vs.
> bool mix for task->complete.

I think you need to reevaluate your definition of vacation. ;-)

But thanks anyway.

> As Paolos fix (5f50be9b5) is relatively new and there are maybe other
> non obvious problems when removing the BH indirection and we are close
> to soft freeze I would leave the BH removal change for 6.2.

Sure, code cleanups aren't urgent.

Kevin




Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex

2021-07-08 Thread Kevin Wolf
Am 08.07.2021 um 13:32 hat Paolo Bonzini geschrieben:
> On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > What is very clear from this patch is that it
> > > is strictly related to the brdv_* and lower level calls, because
> > > they also internally check or even use the aiocontext lock.
> > > Therefore, in order to make it work, I temporarly added some
> > > aiocontext_acquire/release pair around the function that
> > > still assert for them or assume they are hold and temporarly
> > > unlock (unlock() - lock()).
> > 
> > Sounds like the issue is that this patch series assumes AioContext locks
> > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > not the case yet, so you had to then add those aio_context_lock() calls
> > back in elsewhere. This approach introduces unnecessary risk. I think we
> > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > the AioContext lock before applying this series.
> 
> In general I'm in favor of pushing the lock further down into smaller and
> smaller critical sections; it's a good approach to make further audits
> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> reviewed Emanuele's patches to see if this is what he's doing where he's
> adding the acquire/release calls, but that's my understanding of both his
> cover letter and your reply.
> 
> The I/O blk_*()/bdrv_*() *should* not require the caller to hold the
> AioContext lock; all drivers use their own CoMutex or QemuMutex when needed,
> and generic code should also be ready (caveat emptor).  Others, such as
> reopen, are a mess that requires a separate audit.  Restricting
> acquire/release to be only around those seems like a good starting point.

Reopen isn't just a mess, but in fact buggy. After the following patch
goes in, the rule is simple: Don't hold any AioContext locks when
calling bdrv_reopen_multiple().

'block: Acquire AioContexts during bdrv_reopen_multiple()'
https://lists.gnu.org/archive/html/qemu-block/2021-07/msg00238.html

It still takes AioContext locks when it calls into other functions that
currently expect it, but that should be the same as usual then.

And once callers don't even hold the lock in the first place, we'll also
get rid of the ugly temporary lock release across reopen.

Kevin




Re: [PATCH] MAINTAINERS: update block/rbd.c maintainer

2021-07-08 Thread Kevin Wolf
Am 08.07.2021 um 12:44 hat Ilya Dryomov geschrieben:
> On Wed, Jul 7, 2021 at 8:05 PM Peter Lieven  wrote:
> >
> > adding myself as a designated reviewer.
> >
> > Signed-off-by: Peter Lieven 
> > ---
> >  MAINTAINERS | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 516db737d1..cfda57e825 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3058,6 +3058,7 @@ F: block/vmdk.c
> >
> >  RBD
> >  M: Ilya Dryomov 
> > +R: Peter Lieven 
> >  L: qemu-block@nongnu.org
> >  S: Supported
> >  F: block/rbd.c
> 
> Nit: I would change the title to "MAINTAINERS: add block/rbd.c reviewer"
> or something like that.

Yes, this is better. I've changed it in my queue.

Kevin




[PATCH v6 5/6] iotests: Test reopening multiple devices at the same time

2021-07-08 Thread Kevin Wolf
From: Alberto Garcia 

This test swaps the images used by two active block devices.

This is now possible thanks to the new ability to run
x-blockdev-reopen on multiple devices at the same time.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/245 | 47 ++
 tests/qemu-iotests/245.out |  4 ++--
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index ca08955207..8bc8101e6d 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -649,6 +649,53 @@ class TestBlockdevReopen(iotests.QMPTestCase):
  '-c', 'read -P 0x40 0x40008 1',
  '-c', 'read -P 0x80 0x40010 1', 
hd_path[0])
 
+# Swap the disk images of two active block devices
+def test_swap_files(self):
+# Add hd0 and hd2 (none of them with backing files)
+opts0 = hd_opts(0)
+result = self.vm.qmp('blockdev-add', conv_keys = False, **opts0)
+self.assert_qmp(result, 'return', {})
+
+opts2 = hd_opts(2)
+result = self.vm.qmp('blockdev-add', conv_keys = False, **opts2)
+self.assert_qmp(result, 'return', {})
+
+# Write different data to both block devices
+self.run_qemu_io("hd0", "write -P 0xa0 0 1k")
+self.run_qemu_io("hd2", "write -P 0xa2 0 1k")
+
+# Check that the data reads correctly
+self.run_qemu_io("hd0", "read  -P 0xa0 0 1k")
+self.run_qemu_io("hd2", "read  -P 0xa2 0 1k")
+
+# It's not possible to make a block device use an image that
+# is already being used by the other device.
+self.reopen(opts0, {'file': 'hd2-file'},
+"Permission conflict on node 'hd2-file': permissions "
+"'write, resize' are both required by node 'hd2' (uses "
+"node 'hd2-file' as 'file' child) and unshared by node "
+"'hd0' (uses node 'hd2-file' as 'file' child).")
+self.reopen(opts2, {'file': 'hd0-file'},
+"Permission conflict on node 'hd0-file': permissions "
+"'write, resize' are both required by node 'hd0' (uses "
+"node 'hd0-file' as 'file' child) and unshared by node "
+"'hd2' (uses node 'hd0-file' as 'file' child).")
+
+# But we can swap the images if we reopen both devices at the
+# same time
+opts0['file'] = 'hd2-file'
+opts2['file'] = 'hd0-file'
+self.reopenMultiple([opts0, opts2])
+self.run_qemu_io("hd0", "read  -P 0xa2 0 1k")
+self.run_qemu_io("hd2", "read  -P 0xa0 0 1k")
+
+# And we can of course come back to the original state
+opts0['file'] = 'hd0-file'
+opts2['file'] = 'hd2-file'
+self.reopenMultiple([opts0, opts2])
+self.run_qemu_io("hd0", "read  -P 0xa0 0 1k")
+self.run_qemu_io("hd2", "read  -P 0xa2 0 1k")
+
 # Misc reopen tests with different block drivers
 @iotests.skip_if_unsupported(['quorum', 'throttle'])
 def test_misc_drivers(self):
diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out
index daf1e51922..4eced19294 100644
--- a/tests/qemu-iotests/245.out
+++ b/tests/qemu-iotests/245.out
@@ -17,8 +17,8 @@ read 1/1 bytes at offset 262152
 read 1/1 bytes at offset 262160
 1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
-..
+...
 --
-Ran 24 tests
+Ran 25 tests
 
 OK
-- 
2.31.1




[PATCH v6 6/6] block: Make blockdev-reopen stable API

2021-07-08 Thread Kevin Wolf
From: Alberto Garcia 

This patch drops the 'x-' prefix from x-blockdev-reopen.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json|  6 +++---
 blockdev.c  |  2 +-
 tests/qemu-iotests/155  |  2 +-
 tests/qemu-iotests/165  |  2 +-
 tests/qemu-iotests/245  | 10 +-
 tests/qemu-iotests/248  |  2 +-
 tests/qemu-iotests/248.out  |  2 +-
 tests/qemu-iotests/296  |  2 +-
 tests/qemu-iotests/298  |  2 +-
 tests/qemu-iotests/tests/remove-bitmap-from-backing |  4 ++--
 10 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 052520331e..2eb399f0d4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4219,7 +4219,7 @@
 { 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
 
 ##
-# @x-blockdev-reopen:
+# @blockdev-reopen:
 #
 # Reopens one or more block devices using the given set of options.
 # Any option not specified will be reset to its default value regardless
@@ -4257,9 +4257,9 @@
 # image does not have a default backing file name as part of its
 # metadata.
 #
-# Since: 4.0
+# Since: 6.0
 ##
-{ 'command': 'x-blockdev-reopen',
+{ 'command': 'blockdev-reopen',
   'data': { 'options': ['BlockdevOptions'] } }
 
 ##
diff --git a/blockdev.c b/blockdev.c
index fddcccbdbd..6466a106f4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3560,7 +3560,7 @@ fail:
 visit_free(v);
 }
 
-void qmp_x_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
+void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
 {
 BlockReopenQueue *queue = NULL;
 GSList *drained = NULL;
diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
index 2947bfb81a..eadda52615 100755
--- a/tests/qemu-iotests/155
+++ b/tests/qemu-iotests/155
@@ -261,7 +261,7 @@ class TestBlockdevMirrorReopen(MirrorBaseClass):
 result = self.vm.qmp('blockdev-add', node_name="backing",
  driver="null-co")
 self.assert_qmp(result, 'return', {})
-result = self.vm.qmp('x-blockdev-reopen', options=[{
+result = self.vm.qmp('blockdev-reopen', options=[{
  'node-name': "target",
  'driver': iotests.imgfmt,
  'file': "target-file",
diff --git a/tests/qemu-iotests/165 b/tests/qemu-iotests/165
index ef4cf14516..ce499946b8 100755
--- a/tests/qemu-iotests/165
+++ b/tests/qemu-iotests/165
@@ -137,7 +137,7 @@ class TestPersistentDirtyBitmap(iotests.QMPTestCase):
 assert sha256_1 == self.getSha256()
 
 # Reopen to RW
-result = self.vm.qmp('x-blockdev-reopen', options=[{
+result = self.vm.qmp('blockdev-reopen', options=[{
 'node-name': 'node0',
 'driver': iotests.imgfmt,
 'file': {
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 8bc8101e6d..bf8261eec0 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 # group: rw
 #
-# Test cases for the QMP 'x-blockdev-reopen' command
+# Test cases for the QMP 'blockdev-reopen' command
 #
 # Copyright (C) 2018-2019 Igalia, S.L.
 # Author: Alberto Garcia 
@@ -85,16 +85,16 @@ class TestBlockdevReopen(iotests.QMPTestCase):
  "Expected output of %d qemu-io commands, found %d" %
  (found, self.total_io_cmds))
 
-# Run x-blockdev-reopen on a list of block devices
+# Run blockdev-reopen on a list of block devices
 def reopenMultiple(self, opts, errmsg = None):
-result = self.vm.qmp('x-blockdev-reopen', conv_keys=False, 
options=opts)
+result = self.vm.qmp('blockdev-reopen', conv_keys=False, options=opts)
 if errmsg:
 self.assert_qmp(result, 'error/class', 'GenericError')
 self.assert_qmp(result, 'error/desc', errmsg)
 else:
 self.assert_qmp(result, 'return', {})
 
-# Run x-blockdev-reopen on a single block device (specified by
+# Run blockdev-reopen on a single block device (specified by
 # 'opts') but applying 'newopts' on top of it. The original 'opts'
 # dict is unmodified
 def reopen(self, opts, newopts = {}, errmsg = None):
@@ -161,7 +161,7 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 self.reopen(opts, {'file.locking': 'off'}, "Cannot change the option 
'locking'")
 self.reopen(opts, {'file.filename': None}, "Invalid parameter type for 
'options[0].file.filename', expected: string")
 
-# node-name is optional in BlockdevOptions, but x-blockdev-reopen 
needs it
+# node-name is optional 

[PATCH v6 1/6] qcow2: Fix dangling pointer after reopen for 'file'

2021-07-08 Thread Kevin Wolf
Without an external data file, s->data_file is a second pointer with the
same value as bs->file. When changing bs->file to a different BdrvChild
and freeing the old BdrvChild, s->data_file must also be updated,
otherwise it points to freed memory and causes crashes.

This problem was caught by iotests case 245.

Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index ee4530cdbd..9126127633 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1926,6 +1926,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 static int qcow2_reopen_prepare(BDRVReopenState *state,
 BlockReopenQueue *queue, Error **errp)
 {
+BDRVQcow2State *s = state->bs->opaque;
 Qcow2ReopenState *r;
 int ret;
 
@@ -1956,6 +1957,16 @@ static int qcow2_reopen_prepare(BDRVReopenState *state,
 }
 }
 
+/*
+ * Without an external data file, s->data_file points to the same BdrvChild
+ * as bs->file. It needs to be resynced after reopen because bs->file may
+ * be changed. We can't use it in the meantime.
+ */
+if (!has_data_file(state->bs)) {
+assert(s->data_file == state->bs->file);
+s->data_file = NULL;
+}
+
 return 0;
 
 fail:
@@ -1966,7 +1977,16 @@ fail:
 
 static void qcow2_reopen_commit(BDRVReopenState *state)
 {
+BDRVQcow2State *s = state->bs->opaque;
+
 qcow2_update_options_commit(state->bs, state->opaque);
+if (!s->data_file) {
+/*
+ * If we don't have an external data file, s->data_file was cleared by
+ * qcow2_reopen_prepare() and needs to be updated.
+ */
+s->data_file = state->bs->file;
+}
 g_free(state->opaque);
 }
 
@@ -1990,6 +2010,15 @@ static void qcow2_reopen_commit_post(BDRVReopenState 
*state)
 
 static void qcow2_reopen_abort(BDRVReopenState *state)
 {
+BDRVQcow2State *s = state->bs->opaque;
+
+if (!s->data_file) {
+/*
+ * If we don't have an external data file, s->data_file was cleared by
+ * qcow2_reopen_prepare() and needs to be restored.
+ */
+s->data_file = state->bs->file;
+}
 qcow2_update_options_abort(state->bs, state->opaque);
 g_free(state->opaque);
 }
-- 
2.31.1




[PATCH v6 3/6] block: Acquire AioContexts during bdrv_reopen_multiple()

2021-07-08 Thread Kevin Wolf
As the BlockReopenQueue can contain nodes in multiple AioContexts, only
one of which may be locked when AIO_WAIT_WHILE() can be called, we can't
let the caller lock the right contexts. Instead, individually lock the
AioContext of a single node when iterating the queue.

Reintroduce bdrv_reopen() as a wrapper for reopening a single node that
drains the node and temporarily drops the AioContext lock for
bdrv_reopen_multiple().

Signed-off-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  2 ++
 block.c   | 49 ---
 block/replication.c   |  7 +++
 blockdev.c|  5 +
 qemu-io-cmds.c|  7 +--
 5 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 6d42992985..3477290f9a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -388,6 +388,8 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue 
*bs_queue,
 bool keep_old_opts);
 void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue);
 int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp);
+int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
+Error **errp);
 int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
   Error **errp);
 int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
diff --git a/block.c b/block.c
index ee9b46e95e..06fb69df5c 100644
--- a/block.c
+++ b/block.c
@@ -4124,19 +4124,26 @@ void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
  *
  * All affected nodes must be drained between bdrv_reopen_queue() and
  * bdrv_reopen_multiple().
+ *
+ * To be called from the main thread, with all other AioContexts unlocked.
  */
 int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
 {
 int ret = -1;
 BlockReopenQueueEntry *bs_entry, *next;
+AioContext *ctx;
 Transaction *tran = tran_new();
 g_autoptr(GHashTable) found = NULL;
 g_autoptr(GSList) refresh_list = NULL;
 
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 assert(bs_queue != NULL);
 
 QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
+ctx = bdrv_get_aio_context(bs_entry->state.bs);
+aio_context_acquire(ctx);
 ret = bdrv_flush(bs_entry->state.bs);
+aio_context_release(ctx);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Error flushing drive");
 goto abort;
@@ -4145,7 +4152,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 
 QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
 assert(bs_entry->state.bs->quiesce_counter > 0);
+ctx = bdrv_get_aio_context(bs_entry->state.bs);
+aio_context_acquire(ctx);
 ret = bdrv_reopen_prepare(_entry->state, bs_queue, tran, errp);
+aio_context_release(ctx);
 if (ret < 0) {
 goto abort;
 }
@@ -4188,7 +4198,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
  * to first element.
  */
 QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
+ctx = bdrv_get_aio_context(bs_entry->state.bs);
+aio_context_acquire(ctx);
 bdrv_reopen_commit(_entry->state);
+aio_context_release(ctx);
 }
 
 tran_commit(tran);
@@ -4197,7 +4210,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 BlockDriverState *bs = bs_entry->state.bs;
 
 if (bs->drv->bdrv_reopen_commit_post) {
+ctx = bdrv_get_aio_context(bs);
+aio_context_acquire(ctx);
 bs->drv->bdrv_reopen_commit_post(_entry->state);
+aio_context_release(ctx);
 }
 }
 
@@ -4208,7 +4224,10 @@ abort:
 tran_abort(tran);
 QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
 if (bs_entry->prepared) {
+ctx = bdrv_get_aio_context(bs_entry->state.bs);
+aio_context_acquire(ctx);
 bdrv_reopen_abort(_entry->state);
+aio_context_release(ctx);
 }
 }
 
@@ -4218,23 +4237,39 @@ cleanup:
 return ret;
 }
 
-int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
-  Error **errp)
+int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
+Error **errp)
 {
-int ret;
+AioContext *ctx = bdrv_get_aio_context(bs);
 BlockReopenQueue *queue;
-QDict *opts = qdict_new();
-
-qdict_put_bool(opts, BDRV_OPT_READ_ONLY, read_only);
+int ret;
 
 bdrv_subtree_drained_begin(bs);
-queue = bdrv_reopen_queue(NULL, bs, opts, true);
+if (ctx != qemu_get_aio_context()) {
+aio_context_release(ctx);
+}
+
+queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts);
 ret = bdrv_reopen_multiple(queue, errp);
+
+if (ctx != qemu_get_aio_context()) {
+aio_context_acquire(ctx);
+}

[PATCH v6 4/6] block: Support multiple reopening with x-blockdev-reopen

2021-07-08 Thread Kevin Wolf
From: Alberto Garcia 

[ kwolf: Fixed AioContext locking ]

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json  | 18 +++--
 blockdev.c| 81 ++-
 tests/qemu-iotests/155|  9 ++-
 tests/qemu-iotests/165|  4 +-
 tests/qemu-iotests/245| 27 ---
 tests/qemu-iotests/248|  2 +-
 tests/qemu-iotests/248.out|  2 +-
 tests/qemu-iotests/296|  9 ++-
 tests/qemu-iotests/298|  4 +-
 .../tests/remove-bitmap-from-backing  | 18 +++--
 10 files changed, 99 insertions(+), 75 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4a46552816..052520331e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4221,13 +4221,15 @@
 ##
 # @x-blockdev-reopen:
 #
-# Reopens a block device using the given set of options. Any option
-# not specified will be reset to its default value regardless of its
-# previous status. If an option cannot be changed or a particular
+# Reopens one or more block devices using the given set of options.
+# Any option not specified will be reset to its default value regardless
+# of its previous status. If an option cannot be changed or a particular
 # driver does not support reopening then the command will return an
-# error.
+# error. All devices in the list are reopened in one transaction, so
+# if one of them fails then the whole transaction is cancelled.
 #
-# The top-level @node-name option (from BlockdevOptions) must be
+# The command receives a list of block devices to reopen. For each one
+# of them, the top-level @node-name option (from BlockdevOptions) must be
 # specified and is used to select the block device to be reopened.
 # Other @node-name options must be either omitted or set to the
 # current name of the appropriate node. This command won't change any
@@ -4247,8 +4249,8 @@
 #
 #  4) NULL: the current child (if any) is detached.
 #
-# Options (1) and (2) are supported in all cases, but at the moment
-# only @backing allows replacing or detaching an existing child.
+# Options (1) and (2) are supported in all cases. Option (3) is
+# supported for @file and @backing, and option (4) for @backing only.
 #
 # Unlike with blockdev-add, the @backing option must always be present
 # unless the node being reopened does not have a backing file and its
@@ -4258,7 +4260,7 @@
 # Since: 4.0
 ##
 { 'command': 'x-blockdev-reopen',
-  'data': 'BlockdevOptions', 'boxed': true }
+  'data': { 'options': ['BlockdevOptions'] } }
 
 ##
 # @blockdev-del:
diff --git a/blockdev.c b/blockdev.c
index f657d090d3..fddcccbdbd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3560,51 +3560,60 @@ fail:
 visit_free(v);
 }
 
-void qmp_x_blockdev_reopen(BlockdevOptions *options, Error **errp)
-{
-BlockDriverState *bs;
-AioContext *ctx;
-QObject *obj;
-Visitor *v = qobject_output_visitor_new();
-BlockReopenQueue *queue;
-QDict *qdict;
+void qmp_x_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
+{
+BlockReopenQueue *queue = NULL;
+GSList *drained = NULL;
+
+/* Add each one of the BDS that we want to reopen to the queue */
+for (; reopen_list != NULL; reopen_list = reopen_list->next) {
+BlockdevOptions *options = reopen_list->value;
+BlockDriverState *bs;
+AioContext *ctx;
+QObject *obj;
+Visitor *v;
+QDict *qdict;
+
+/* Check for the selected node name */
+if (!options->has_node_name) {
+error_setg(errp, "node-name not specified");
+goto fail;
+}
 
-/* Check for the selected node name */
-if (!options->has_node_name) {
-error_setg(errp, "node-name not specified");
-goto fail;
-}
+bs = bdrv_find_node(options->node_name);
+if (!bs) {
+error_setg(errp, "Failed to find node with node-name='%s'",
+   options->node_name);
+goto fail;
+}
 
-bs = bdrv_find_node(options->node_name);
-if (!bs) {
-error_setg(errp, "Failed to find node with node-name='%s'",
-   options->node_name);
-goto fail;
-}
+/* Put all options in a QDict and flatten it */
+v = qobject_output_visitor_new();
+visit_type_BlockdevOptions(v, NULL, , _abort);
+visit_complete(v, );
+visit_free(v);
 
-/* Put all options in a QDict and flatten it */
-visit_type_BlockdevOptions(v, NULL, , _abort);
-visit_complete(v, );
-qdict = qobject_to(QDict, obj);
+qdict = qobject_to(QDict, obj);
 
-qdict_flatten(qdict);
+qdict_flatten(qdict);
 
-/* Perform the reopen operation */
-ctx = bdrv_get_aio_context(bs);
-aio_context_acquire(ctx);
-

[PATCH v6 2/6] block: Add bdrv_reopen_queue_free()

2021-07-08 Thread Kevin Wolf
From: Alberto Garcia 

Move the code to free a BlockReopenQueue to a separate function.
It will be used in a subsequent patch.

[ kwolf: Also free explicit_options and options, and explicitly
  qobject_ref() the value when it continues to be used. This makes
  future memory leaks less likely. ]

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  1 +
 block.c   | 22 --
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 7ec77ecb1a..6d42992985 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -386,6 +386,7 @@ BlockDriverState *bdrv_new_open_driver(BlockDriver *drv, 
const char *node_name,
 BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
 BlockDriverState *bs, QDict *options,
 bool keep_old_opts);
+void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue);
 int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp);
 int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
   Error **errp);
diff --git a/block.c b/block.c
index acd35cb0cb..ee9b46e95e 100644
--- a/block.c
+++ b/block.c
@@ -4095,6 +4095,19 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue 
*bs_queue,
NULL, 0, keep_old_opts);
 }
 
+void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
+{
+if (bs_queue) {
+BlockReopenQueueEntry *bs_entry, *next;
+QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
+qobject_unref(bs_entry->state.explicit_options);
+qobject_unref(bs_entry->state.options);
+g_free(bs_entry);
+}
+g_free(bs_queue);
+}
+}
+
 /*
  * Reopen multiple BlockDriverStates atomically & transactionally.
  *
@@ -4197,15 +4210,10 @@ abort:
 if (bs_entry->prepared) {
 bdrv_reopen_abort(_entry->state);
 }
-qobject_unref(bs_entry->state.explicit_options);
-qobject_unref(bs_entry->state.options);
 }
 
 cleanup:
-QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
-g_free(bs_entry);
-}
-g_free(bs_queue);
+bdrv_reopen_queue_free(bs_queue);
 
 return ret;
 }
@@ -4573,6 +4581,8 @@ static void bdrv_reopen_commit(BDRVReopenState 
*reopen_state)
 /* set BDS specific flags now */
 qobject_unref(bs->explicit_options);
 qobject_unref(bs->options);
+qobject_ref(reopen_state->explicit_options);
+qobject_ref(reopen_state->options);
 
 bs->explicit_options   = reopen_state->explicit_options;
 bs->options= reopen_state->options;
-- 
2.31.1




[PATCH v6 0/6] Make blockdev-reopen stable

2021-07-08 Thread Kevin Wolf
This series picks up the remaining patches from Berto's series "[PATCH
v4 0/6] Allow changing bs->file on reopen", which are not merged into
master yet.

Apart from renaming 'x-blockdev-reopen' into 'blockdev-reopen', the
remaining functional change in this series is taking a list of nodes to
reopen as an argument so that multiple changes to the graph can be made
atomically that would be invalid separately (e.g. due to permission
checks on the intermediate state).

It also contains a qcow2 fix for a bug introduced by the part of the
series that was already picked up in Vladimir's "[PATCH v6 0/9] Allow
changing bs->file on reopen".

v6:
- Changed qcow2 fix to set s->data_file = NULL between .prepare and
  .commit instead of using a separate bool [Vladimir]
- Coding style fixes [Vladimir]

Alberto Garcia (4):
  block: Add bdrv_reopen_queue_free()
  block: Support multiple reopening with x-blockdev-reopen
  iotests: Test reopening multiple devices at the same time
  block: Make blockdev-reopen stable API

Kevin Wolf (2):
  qcow2: Fix dangling pointer after reopen for 'file'
  block: Acquire AioContexts during bdrv_reopen_multiple()

 qapi/block-core.json  | 24 +++---
 include/block/block.h |  3 +
 block.c   | 71 +
 block/qcow2.c | 29 +++
 block/replication.c   |  7 ++
 blockdev.c| 76 ++
 qemu-io-cmds.c|  7 +-
 tests/qemu-iotests/155|  9 ++-
 tests/qemu-iotests/165|  4 +-
 tests/qemu-iotests/245| 78 +++
 tests/qemu-iotests/245.out|  4 +-
 tests/qemu-iotests/248|  4 +-
 tests/qemu-iotests/248.out|  2 +-
 tests/qemu-iotests/296| 11 ++-
 tests/qemu-iotests/298|  4 +-
 .../tests/remove-bitmap-from-backing  | 22 +++---
 16 files changed, 255 insertions(+), 100 deletions(-)

-- 
2.31.1




Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex

2021-07-08 Thread Paolo Bonzini

On 08/07/21 12:36, Stefan Hajnoczi wrote:

What is very clear from this patch is that it
is strictly related to the brdv_* and lower level calls, because
they also internally check or even use the aiocontext lock.
Therefore, in order to make it work, I temporarly added some
aiocontext_acquire/release pair around the function that
still assert for them or assume they are hold and temporarly
unlock (unlock() - lock()).


Sounds like the issue is that this patch series assumes AioContext locks
are no longer required for calling the blk_*()/bdrv_*() APIs? That is
not the case yet, so you had to then add those aio_context_lock() calls
back in elsewhere. This approach introduces unnecessary risk. I think we
should wait until blk_*()/bdrv_*() no longer requires the caller to hold
the AioContext lock before applying this series.


In general I'm in favor of pushing the lock further down into smaller 
and smaller critical sections; it's a good approach to make further 
audits easier until it's "obvious" that the lock is unnecessary.  I 
haven't yet reviewed Emanuele's patches to see if this is what he's 
doing where he's adding the acquire/release calls, but that's my 
understanding of both his cover letter and your reply.


The I/O blk_*()/bdrv_*() *should* not require the caller to hold the 
AioContext lock; all drivers use their own CoMutex or QemuMutex when 
needed, and generic code should also be ready (caveat emptor).  Others, 
such as reopen, are a mess that requires a separate audit.  Restricting 
acquire/release to be only around those seems like a good starting point.


Paolo




Re: [RFC PATCH 4/6] job.h: categorize job fields

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 06:58:11PM +0200, Emanuele Giuseppe Esposito wrote:
> -/** AioContext to run the job coroutine in */
> +/**
> + * AioContext to run the job coroutine in.
> + * Atomic.
> + */
>  AioContext *aio_context;

This isn't accessed using atomic operations, so I'm not sure why it's
documented as atomic?


signature.asc
Description: PGP signature


Re: [RFC PATCH 3/6] job: minor changes to simplify locking

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 06:58:10PM +0200, Emanuele Giuseppe Esposito wrote:
> @@ -406,15 +410,18 @@ void *job_create(const char *job_id, const JobDriver 
> *driver, JobTxn *txn,
>  error_setg(errp, "Invalid job ID '%s'", job_id);
>  return NULL;
>  }
> -if (job_get(job_id)) {
> -error_setg(errp, "Job ID '%s' already in use", job_id);
> -return NULL;
> -}
>  } else if (!(flags & JOB_INTERNAL)) {
>  error_setg(errp, "An explicit job ID is required");
>  return NULL;
>  }
>  
> +job_lock();
> +if (job_get(job_id)) {
> +error_setg(errp, "Job ID '%s' already in use", job_id);
> +job_unlock();
> +return NULL;
> +}
> +

Where is the matching job_unlock() in the success case? Please consider
lock guard macros like QEMU_LOCK_GUARD()/WITH_QEMU_LOCK_GUARD() to
prevent common errors.


signature.asc
Description: PGP signature


Re: [RFC PATCH 2/6] job: _locked functions and public job_lock/unlock for next patch

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 06:58:09PM +0200, Emanuele Giuseppe Esposito wrote:
> diff --git a/job.c b/job.c
> index 872bbebb01..96fb8e9730 100644
> --- a/job.c
> +++ b/job.c
> @@ -32,6 +32,10 @@
>  #include "trace/trace-root.h"
>  #include "qapi/qapi-events-job.h"
>  
> +/* job_mutex protexts the jobs list, but also the job operations. */
> +static QemuMutex job_mutex;

It's unclear what protecting "job operations" means. I would prefer a
fine-grained per-job lock that protects the job's fields instead of a
global lock with an unclear scope.

> +
> +/* Protected by job_mutex */
>  static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
>  
>  /* Job State Transition Table */
> @@ -64,27 +68,22 @@ bool JobVerbTable[JOB_VERB__MAX][JOB_STATUS__MAX] = {
>  /* Transactional group of jobs */
>  struct JobTxn {
>  
> -/* Is this txn being cancelled? */
> +/* Is this txn being cancelled? Atomic.*/
>  bool aborting;

The comment says atomic but this field is not accessed using atomic
operations (at least at this point in the patch series)?

>  
> -/* List of jobs */
> +/* List of jobs. Protected by job_mutex. */
>  QLIST_HEAD(, Job) jobs;
>  
> -/* Reference count */
> +/* Reference count. Atomic. */
>  int refcnt;

Same.


signature.asc
Description: PGP signature


Re: [PATCH] MAINTAINERS: update block/rbd.c maintainer

2021-07-08 Thread Ilya Dryomov
On Wed, Jul 7, 2021 at 8:05 PM Peter Lieven  wrote:
>
> adding myself as a designated reviewer.
>
> Signed-off-by: Peter Lieven 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 516db737d1..cfda57e825 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3058,6 +3058,7 @@ F: block/vmdk.c
>
>  RBD
>  M: Ilya Dryomov 
> +R: Peter Lieven 
>  L: qemu-block@nongnu.org
>  S: Supported
>  F: block/rbd.c
> --
> 2.17.1
>
>

Nit: I would change the title to "MAINTAINERS: add block/rbd.c reviewer"
or something like that.

Acked-by: Ilya Dryomov 

Thanks again for volunteering!

Ilya



Re: [PATCH] block/rbd: fix type of task->complete

2021-07-08 Thread Ilya Dryomov
On Wed, Jul 7, 2021 at 8:05 PM Peter Lieven  wrote:
>
> task->complete is a bool not an integer.
>
> Signed-off-by: Peter Lieven 
> ---
>  block/rbd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 01a7b94d62..dcf82b15b8 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1066,7 +1066,7 @@ static int qemu_rbd_resize(BlockDriverState *bs, 
> uint64_t size)
>  static void qemu_rbd_finish_bh(void *opaque)
>  {
>  RBDTask *task = opaque;
> -task->complete = 1;
> +task->complete = true;
>  aio_co_wake(task->co);
>  }
>
> --
> 2.17.1
>
>

Reviewed-by: Ilya Dryomov 

Thanks,

Ilya



Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote:
> This is a continuation on the work to reduce (and possibly get rid of) the 
> usage of AioContext lock, by introducing smaller granularity locks to keep 
> the thread safety.
> 
> This series aims to:
> 1) remove the aiocontext lock and substitute it with the already existing
>global job_mutex
> 2) fix what it looks like to be an oversight when moving the blockjob.c logic
>into the more generic job.c: job_mutex was introduced especially to
>protect job->busy flag, but it seems that it was not used in successive
>patches, because there are multiple code sections that directly
>access the field without any locking.
> 3) use job_mutex instead of the aiocontext_lock
> 4) extend the reach of the job_mutex to protect all shared fields
>that the job structure has.
> 
> The reason why we propose to use the existing job_mutex and not make one for
> each job is to keep things as simple as possible for now, and because
> the jobs are not in the execution critical path, so we can affort
> some delays.
> Having a lock per job would increase overall complexity and
> increase the chances of deadlocks (one good example could be the job
> transactions, where multiple jobs are grouped together).
> Anyways, the per-job mutex can always be added in the future.
> 
> Patch 1-4 are in preparation for patch 5. They try to simplify and clarify
> the job_mutex usage. Patch 5 tries to add proper syncronization to the job
> structure, replacing the AioContext lock when necessary.
> Patch 6 just removes unnecessary AioContext locks that are now unneeded.
> 
> 
> RFC: I am not sure the way I layed out the locks is ideal.
> But their usage should not make deadlocks. I also made sure
> the series passess all qemu_iotests.
> 
> What is very clear from this patch is that it
> is strictly related to the brdv_* and lower level calls, because
> they also internally check or even use the aiocontext lock.
> Therefore, in order to make it work, I temporarly added some
> aiocontext_acquire/release pair around the function that
> still assert for them or assume they are hold and temporarly
> unlock (unlock() - lock()).

Sounds like the issue is that this patch series assumes AioContext locks
are no longer required for calling the blk_*()/bdrv_*() APIs? That is
not the case yet, so you had to then add those aio_context_lock() calls
back in elsewhere. This approach introduces unnecessary risk. I think we
should wait until blk_*()/bdrv_*() no longer requires the caller to hold
the AioContext lock before applying this series.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] block/rbd: fix type of task->complete

2021-07-08 Thread Kevin Wolf
Am 07.07.2021 um 23:51 hat Connor Kuehl geschrieben:
> On 7/7/21 11:04 AM, Peter Lieven wrote:
> > task->complete is a bool not an integer.
> > 
> > Signed-off-by: Peter Lieven 
> > ---
> >  block/rbd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/rbd.c b/block/rbd.c
> > index 01a7b94d62..dcf82b15b8 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -1066,7 +1066,7 @@ static int qemu_rbd_resize(BlockDriverState *bs, 
> > uint64_t size)
> >  static void qemu_rbd_finish_bh(void *opaque)
> >  {
> >  RBDTask *task = opaque;
> > -task->complete = 1;
> > +task->complete = true;
> >  aio_co_wake(task->co);
> >  }
> >  
> > 
> 
> Hi Peter,
> 
> What tree/QEMU git sha does this apply to? I am having trouble
> finding definitions for RBDTask and qemu_rbd_finish_bh; and the
> patch won't apply to my few-minutes-old clone of upstream.

It is on top of:

[PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support
Message-Id: <20210702172356.11574-1-idryo...@gmail.com>

Also, thanks, applied to the block branch.

Kevin




Re: [PATCH v3 3/2] qemu-img: Reword 'qemu-img map --output=json' docs

2021-07-08 Thread Vladimir Sementsov-Ogievskiy

07.07.2021 21:41, Eric Blake wrote:

Reword the paragraphs to list the JSON key first, rather than in the
middle of prose.

Suggested-by: Vladimir Sementsov-Ogievskiy
Signed-off-by: Eric Blake


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging

2021-07-08 Thread Hannes Reinecke

On 7/7/21 6:14 PM, Stefan Hajnoczi wrote:

On Wed, Jul 07, 2021 at 12:43:56PM +0200, Hannes Reinecke wrote:

On 7/7/21 11:53 AM, Klaus Jensen wrote:

On Jul  7 09:49, Hannes Reinecke wrote:

On 7/6/21 11:33 AM, Klaus Jensen wrote:

From: Klaus Jensen 

Prior to this patch the nvme-ns devices are always children of the
NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
unrealized when the parent device is removed. However, when subsystems
are involved, this is not what we want since the namespaces may be
attached to other controllers as well.

This patch adds an additional NvmeBus on the subsystem device. When
nvme-ns devices are realized, if the parent controller device is linked
to a subsystem, the parent bus is set to the subsystem one instead. This
makes sure that namespaces are kept alive and not unrealized.

Signed-off-by: Klaus Jensen 
---
  hw/nvme/nvme.h   | 18 ++
  hw/nvme/ctrl.c   |  8 +---
  hw/nvme/ns.c | 32 +---
  hw/nvme/subsys.c |  4 
  4 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index c4065467d877..9401e212f9f7 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES >
NVME_NSID_BROADCAST - 1);
  typedef struct NvmeCtrl NvmeCtrl;
  typedef struct NvmeNamespace NvmeNamespace;
+#define TYPE_NVME_BUS "nvme-bus"
+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
+
+typedef struct NvmeBus {
+    BusState parent_bus;
+    bool is_subsys;
+} NvmeBus;
+
  #define TYPE_NVME_SUBSYS "nvme-subsys"
  #define NVME_SUBSYS(obj) \
  OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
  typedef struct NvmeSubsystem {
  DeviceState parent_obj;
+    NvmeBus bus;
  uint8_t subnqn[256];
  NvmeCtrl  *ctrls[NVME_MAX_CONTROLLERS];
@@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
  QTAILQ_HEAD(, NvmeRequest) req_list;
  } NvmeCQueue;
-#define TYPE_NVME_BUS "nvme-bus"
-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
-
-typedef struct NvmeBus {
-    BusState parent_bus;
-} NvmeBus;
-
  #define TYPE_NVME "nvme"
  #define NVME(obj) \
  OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
@@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
  static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
  {
-    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
+    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
  return NULL;
  }
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 90e3ee2b70ee..7c8fca36d9a5 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
  for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
  ns = nvme_ns(n, i);
-    if (!ns) {
-    continue;
+    if (ns) {
+    ns->attached--;
  }
+    }
-    nvme_ns_cleanup(ns);
+    if (n->subsys) {
+    nvme_subsys_unregister_ctrl(n->subsys, n);
  }
  if (n->subsys) {
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 3c4f5b8c714a..612a2786d75d 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
  static void nvme_ns_realize(DeviceState *dev, Error **errp)
  {
  NvmeNamespace *ns = NVME_NS(dev);
-    BusState *s = qdev_get_parent_bus(dev);
-    NvmeCtrl *n = NVME(s->parent);
-    NvmeSubsystem *subsys = n->subsys;
+    BusState *qbus = qdev_get_parent_bus(dev);
+    NvmeBus *bus = NVME_BUS(qbus);
+    NvmeCtrl *n = NULL;
+    NvmeSubsystem *subsys = NULL;
  uint32_t nsid = ns->params.nsid;
  int i;
-    if (!n->subsys) {
+    if (bus->is_subsys) {
+    subsys = NVME_SUBSYS(qbus->parent);
+    } else {
+    n = NVME(qbus->parent);
+    subsys = n->subsys;
+    }
+
+    if (subsys) {
+    /*
+ * If this namespace belongs to a subsystem (through a
link on the
+ * controller device), reparent the device.
+ */
+    if (!qdev_set_parent_bus(dev, >bus.parent_bus, errp)) {
+    return;
+    }
+    } else {
  if (ns->params.detached) {
  error_setg(errp, "detached requires that the nvme
device is "
     "linked to an nvme-subsys device");
@@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState
*dev, Error **errp)
  if (!nsid) {
  for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
-    if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
+    if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
  continue;
  }
@@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState
*dev, Error **errp)
  return;
  }
  } else {
-    if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
+    if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
  error_setg(errp, "namespace id '%d' already
allocated", nsid);
  return;
  }
@@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState