[PULL v2 10/12] block: Return depth level during bdrv_is_allocated_above

2020-10-30 Thread Eric Blake
When checking for allocation across a chain, it's already easy to
count the depth within the chain at which the allocation is found.
Instead of throwing that information away, return it to the caller.
Existing callers only cared about allocated/non-allocated, but having
a depth available will be used by NBD in the next patch.

Signed-off-by: Eric Blake 
Message-Id: <20201027050556.269064-9-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
[eblake: rebase to master]
Signed-off-by: Eric Blake 
---
 block/coroutines.h |  6 --
 block/io.c | 31 +++
 block/commit.c |  2 +-
 block/mirror.c |  2 +-
 block/stream.c |  2 +-
 5 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 1cb3128b942c..4cfb4946e65e 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -47,7 +47,8 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t bytes,
   int64_t *pnum,
   int64_t *map,
-  BlockDriverState **file);
+  BlockDriverState **file,
+  int *depth);
 int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
@@ -57,7 +58,8 @@ bdrv_common_block_status_above(BlockDriverState *bs,
int64_t bytes,
int64_t *pnum,
int64_t *map,
-   BlockDriverState **file);
+   BlockDriverState **file,
+   int *depth);

 int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs,
QEMUIOVector *qiov, int64_t pos);
diff --git a/block/io.c b/block/io.c
index 9918f2499c19..ec5e152bb70f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2362,20 +2362,28 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t bytes,
   int64_t *pnum,
   int64_t *map,
-  BlockDriverState **file)
+  BlockDriverState **file,
+  int *depth)
 {
 int ret;
 BlockDriverState *p;
 int64_t eof = 0;
+int dummy;

 assert(!include_base || base); /* Can't include NULL base */

+if (!depth) {
+depth = 
+}
+*depth = 0;
+
 if (!include_base && bs == base) {
 *pnum = bytes;
 return 0;
 }

 ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
+++*depth;
 if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
 return ret;
 }
@@ -2392,6 +2400,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
+++*depth;
 if (ret < 0) {
 return ret;
 }
@@ -2450,7 +2459,7 @@ int bdrv_block_status_above(BlockDriverState *bs, 
BlockDriverState *base,
 int64_t *map, BlockDriverState **file)
 {
 return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
-  pnum, map, file);
+  pnum, map, file, NULL);
 }

 int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
@@ -2478,7 +2487,7 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState 
*bs, int64_t offset,
 }

 ret = bdrv_common_block_status_above(bs, NULL, false, false, offset,
- bytes, , NULL, NULL);
+ bytes, , NULL, NULL, NULL);

 if (ret < 0) {
 return ret;
@@ -2495,7 +2504,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t offset,

 ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
  bytes, pnum ? pnum : , NULL,
- NULL);
+ NULL, NULL);
 if (ret < 0) {
 return ret;
 }
@@ -2505,8 +2514,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t offset,
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
- * Return 1 if (a prefix of) the given range is allocated in any image
- * between BASE and TOP (BASE is only included if include_base is set).
+ * Return a positive depth if (a prefix of) the given range is allocated
+ * in any image between BASE and TOP (BASE is only included if include_base
+ * is set).  Depth 1 is TOP, 2 is the first backing layer, and so forth.
  * BASE can be NULL to check if 

Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-30 Thread Niklas Cassel
On Fri, Oct 30, 2020 at 11:32:38AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 

(snip)

> +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
> +{
> +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> +NvmeNamespace *ns = req->ns;
> +/* cdw12 is zero-based number of dwords to return. Convert to bytes */
> +uint32_t len = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> +uint32_t zone_idx, zra, zrasf, partial;
> +uint64_t max_zones, nr_zones = 0;
> +uint16_t ret;
> +uint64_t slba;
> +NvmeZoneDescr *z;
> +NvmeZone *zs;
> +NvmeZoneReportHeader *header;
> +void *buf, *buf_p;
> +size_t zone_entry_sz;
> +
> +req->status = NVME_SUCCESS;
> +
> +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> +if (ret) {
> +return ret;
> +}
> +
> +if (len < sizeof(NvmeZoneReportHeader)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}

Just like nvme_read() and nvme_write(), nvme_zone_mgmt_recv()
has to do something like:

+ret = nvme_check_mdts(n, len);
+if (ret) {
+trace_pci_nvme_err_mdts(nvme_cid(req), len);
+return ret;
+}
+

To see that we are not exceeding MDTS.


Kind regards,
Niklas


> +
> +zra = dw13 & 0xff;
> +if (!(zra == NVME_ZONE_REPORT || zra == NVME_ZONE_REPORT_EXTENDED)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +if (zra == NVME_ZONE_REPORT_EXTENDED) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +zrasf = (dw13 >> 8) & 0xff;
> +if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +partial = (dw13 >> 16) & 0x01;
> +
> +zone_entry_sz = sizeof(NvmeZoneDescr);
> +
> +max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
> +buf = g_malloc0(len);
> +
> +header = (NvmeZoneReportHeader *)buf;
> +buf_p = buf + sizeof(NvmeZoneReportHeader);
> +
> +while (zone_idx < ns->num_zones && nr_zones < max_zones) {
> +zs = >zone_array[zone_idx];
> +
> +if (!nvme_zone_matches_filter(zrasf, zs)) {
> +zone_idx++;
> +continue;
> +}
> +
> +z = (NvmeZoneDescr *)buf_p;
> +buf_p += sizeof(NvmeZoneDescr);
> +nr_zones++;
> +
> +z->zt = zs->d.zt;
> +z->zs = zs->d.zs;
> +z->zcap = cpu_to_le64(zs->d.zcap);
> +z->zslba = cpu_to_le64(zs->d.zslba);
> +z->za = zs->d.za;
> +
> +if (nvme_wp_is_valid(zs)) {
> +z->wp = cpu_to_le64(zs->d.wp);
> +} else {
> +z->wp = cpu_to_le64(~0ULL);
> +}
> +
> +zone_idx++;
> +}
> +
> +if (!partial) {
> +for (; zone_idx < ns->num_zones; zone_idx++) {
> +zs = >zone_array[zone_idx];
> +if (nvme_zone_matches_filter(zrasf, zs)) {
> +nr_zones++;
> +}
> +}
> +}
> +header->nr_zones = cpu_to_le64(nr_zones);
> +
> +ret = nvme_dma(n, (uint8_t *)buf, len, DMA_DIRECTION_FROM_DEVICE, req);
> +
> +g_free(buf);
> +
> +return ret;
> +}


Re: --enable-xen on gitlab CI? (was Re: [PATCH 09/36] qdev: Make qdev_get_prop_ptr() get Object* arg)

2020-10-30 Thread Paolo Bonzini
On 30/10/20 12:35, Eduardo Habkost wrote:
> 
> What is necessary to make sure we have a CONFIG_XEN=y job in
> gitlab CI?  Maybe just including xen-devel in some of the
> container images is enough?

Fedora already has it, but build-system-fedora does not include
x86_64-softmmu.

Paolo




[PATCH v10 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe

2020-10-30 Thread Lukas Straub
Make qio_channel_tls_shutdown thread-safe by using atomics when
accessing tioc->shutdown.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrangé 
---
 io/channel-tls.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f..10d0bf59aa 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -23,6 +23,7 @@
 #include "qemu/module.h"
 #include "io/channel-tls.h"
 #include "trace.h"
+#include "qemu/atomic.h"


 static ssize_t qio_channel_tls_write_handler(const char *buf,
@@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
 return QIO_CHANNEL_ERR_BLOCK;
 }
 } else if (errno == ECONNABORTED &&
-   (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) {
+   (qatomic_load_acquire(>shutdown) &
+QIO_CHANNEL_SHUTDOWN_READ)) {
 return 0;
 }

@@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
 {
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);

-tioc->shutdown |= how;
+qatomic_or(>shutdown, how);

 return qio_channel_shutdown(tioc->master, how, errp);
 }
--
2.20.1



pgprghgOqQAAI.pgp
Description: OpenPGP digital signature


[PATCH v10 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test

2020-10-30 Thread Lukas Straub
A connecting chardev object has an additional reference by the connecting
thread, so if the chardev is still connecting by the end of the test,
then the chardev object won't be freed. This in turn means that the yank
instance won't be unregistered and when running the next test-case
yank_register_instance will abort, because the yank instance is
already/still registered.

Signed-off-by: Lukas Straub 
Reviewed-by: Daniel P. Berrangé 
---
 tests/test-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/test-char.c b/tests/test-char.c
index 9196e566e9..aedb5c9eda 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer 
opaque)
 g_assert_nonnull(opts);
 chr1 = qemu_chr_new_from_opts(opts, NULL, _abort);
 g_assert_nonnull(chr1);
+qemu_chr_wait_connected(chr1, _abort);

 chr2 = qemu_chr_new_from_opts(opts, NULL, _err);
 g_assert_null(chr2);
--
2.20.1


pgpJtO1z6hH_l.pgp
Description: OpenPGP digital signature


[PATCH v10 4/8] migration: Add yank feature

2020-10-30 Thread Lukas Straub
Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
Acked-by: Dr. David Alan Gilbert 
---
 migration/channel.c   | 13 +
 migration/migration.c | 25 +
 migration/multifd.c   | 10 ++
 migration/qemu-file-channel.c |  7 +++
 migration/savevm.c|  6 ++
 5 files changed, 61 insertions(+)

diff --git a/migration/channel.c b/migration/channel.c
index 8a783baa0b..35fe234e9c 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "qemu/yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));

+if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel,
+   QIO_CHANNEL(ioc));
+}
+
 if (s->parameters.tls_creds &&
 *s->parameters.tls_creds &&
 !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);

 if (!error) {
+if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+yank_register_function(MIGRATION_YANK_INSTANCE,
+   yank_generic_iochannel,
+   QIO_CHANNEL(ioc));
+}
+
 if (s->parameters.tls_creds &&
 *s->parameters.tls_creds &&
 !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index 9bb4fee5ac..0b0442df37 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -56,6 +56,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/yank.h"

 #define MAX_THROTTLE  (128 << 20)  /* Migration transfer speed throttling 
*/

@@ -248,6 +249,8 @@ void migration_incoming_state_destroy(void)
 qapi_free_SocketAddressList(mis->socket_address_list);
 mis->socket_address_list = NULL;
 }
+
+yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }

 static void migrate_generate_event(int new_state)
@@ -425,8 +428,14 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 {
 const char *p = NULL;

+yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+if (*errp) {
+return;
+}
+
 qapi_event_send_migration(MIGRATION_STATUS_SETUP);
 if (!strcmp(uri, "defer")) {
+yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 deferred_incoming_migration(errp);
 } else if (strstart(uri, "tcp:", ) ||
strstart(uri, "unix:", NULL) ||
@@ -441,6 +450,7 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 } else if (strstart(uri, "fd:", )) {
 fd_start_incoming_migration(p, errp);
 } else {
+yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 error_setg(errp, "unknown migration protocol: %s", uri);
 }
 }
@@ -1733,6 +1743,7 @@ static void migrate_fd_cleanup(MigrationState *s)
 }
 notifier_list_notify(_state_notifiers, s);
 block_cleanup_parameters(s);
+yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -2007,6 +2018,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
  * only re-setup the migration stream and poke existing migration
  * to continue using that newly established channel.
  */
+yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 qemu_start_incoming_migration(uri, errp);
 }

@@ -2144,6 +2156,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 return;
 }

+if (!(has_resume && resume)) {
+yank_register_instance(MIGRATION_YANK_INSTANCE, errp);
+if (*errp) {
+return;
+}
+}
+
 if (strstart(uri, "tcp:", ) ||
 strstart(uri, "unix:", NULL) ||
 strstart(uri, "vsock:", NULL)) {
@@ -2157,6 +2176,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 } else if (strstart(uri, "fd:", )) {
 fd_start_outgoing_migration(s, p, _err);
 } else {
+if (!(has_resume && resume)) {
+yank_unregister_instance(MIGRATION_YANK_INSTANCE);
+}
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
"a valid migration protocol");
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
@@ -2166,6 +2188,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 }

 if (local_err) {
+if (!(has_resume && resume)) {
+

[PATCH v10 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown

2020-10-30 Thread Lukas Straub
Migration and yank code assume that qio_channel_shutdown is thread
-safe and can be called from qmp oob handler. Document this after
checking the code.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrangé 
---
 include/io/channel.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 3c04f0edda..e0b9fc615d 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -92,7 +92,8 @@ struct QIOChannel {
  * provide additional optional features.
  *
  * Consult the corresponding public API docs for a description
- * of the semantics of each callback
+ * of the semantics of each callback. io_shutdown in particular
+ * must be thread-safe, terminate quickly and must not block.
  */
 struct QIOChannelClass {
 ObjectClass parent;
@@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc,
  * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling
  * this method.
  *
+ * This function is thread-safe, terminates quickly and does not block.
+ *
  * Returns: 0 on success, -1 on error
  */
 int qio_channel_shutdown(QIOChannel *ioc,
--
2.20.1



pgpSPmLHd3Cfu.pgp
Description: OpenPGP digital signature


[PATCH v10 2/8] block/nbd.c: Add yank feature

2020-10-30 Thread Lukas Straub
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
---
 block/nbd.c | 154 +++-
 1 file changed, 93 insertions(+), 61 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4548046cd7..d66c84ee40 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -44,6 +45,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "qemu/yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS16

@@ -140,14 +143,13 @@ typedef struct BDRVNBDState {
 NBDConnectThread *connect_thread;
 } BDRVNBDState;

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-  Error **errp);
-static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs,
- Error **errp);
+static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
+Error **errp);
+static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
 static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
bool detach);
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -165,12 +167,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
 if (ret == -EIO) {
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) {
 s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
 NBD_CLIENT_CONNECTING_NOWAIT;
 }
 } else {
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) {
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
 s->state = NBD_CLIENT_QUIT;
@@ -203,7 +205,7 @@ static void reconnect_delay_timer_cb(void *opaque)
 {
 BDRVNBDState *s = opaque;

-if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) {
 s->state = NBD_CLIENT_CONNECTING_NOWAIT;
 while (qemu_co_enter_next(>free_sema, NULL)) {
 /* Resume all queued requests */
@@ -215,7 +217,7 @@ static void reconnect_delay_timer_cb(void *opaque)

 static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t 
expire_time_ns)
 {
-if (s->state != NBD_CLIENT_CONNECTING_WAIT) {
+if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) {
 return;
 }

@@ -260,7 +262,7 @@ static void nbd_client_attach_aio_context(BlockDriverState 
*bs,
  * s->connection_co is either yielded from nbd_receive_reply or from
  * nbd_co_reconnect_loop()
  */
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) {
 qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
 }

@@ -286,7 +288,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)

 reconnect_delay_timer_del(s);

-if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) {
 s->state = NBD_CLIENT_CONNECTING_NOWAIT;
 qemu_co_queue_restart_all(>free_sema);
 }
@@ -337,13 +339,14 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+NBDClientState state = qatomic_load_acquire(>state);
+return state == NBD_CLIENT_CONNECTING_WAIT ||
+state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-return s->state == NBD_CLIENT_CONNECTING_WAIT;
+return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static void connect_bh(void *opaque)
@@ -423,12 +426,12 @@ static void *connect_thread_func(void *opaque)
 return NULL;
 }

-static QIOChannelSocket *coroutine_fn
+static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
 {
+int ret;
 QemuThread thread;
 BDRVNBDState *s = bs->opaque;
-QIOChannelSocket *res;
 NBDConnectThread *thr = s->connect_thread;

 qemu_mutex_lock(>mutex);
@@ -445,10 +448,12 @@ 

[PATCH v10 7/8] MAINTAINERS: Add myself as maintainer for yank feature

2020-10-30 Thread Lukas Straub
I'll maintain this for now as the colo usecase is the first user
of this functionality.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8c744a9bdf..81288fd219 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2676,6 +2676,13 @@ F: util/uuid.c
 F: include/qemu/uuid.h
 F: tests/test-uuid.c

+Yank feature
+M: Lukas Straub 
+S: Odd fixes
+F: util/yank.c
+F: include/qemu/yank.h
+F: qapi/yank.json
+
 COLO Framework
 M: zhanghailiang 
 S: Maintained
--
2.20.1



pgpjxUauG10GX.pgp
Description: OpenPGP digital signature


[PATCH v10 3/8] chardev/char-socket.c: Add yank feature

2020-10-30 Thread Lukas Straub
Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
---
 chardev/char-socket.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 95e45812d5..5947cbe8bb 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/yank.h"

 #include "chardev/char-io.h"
 #include "qom/object.h"
@@ -70,6 +71,7 @@ struct SocketChardev {
 size_t read_msgfds_num;
 int *write_msgfds;
 size_t write_msgfds_num;
+bool registered_yank;

 SocketAddress *addr;
 bool is_listen;
@@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr)

 tcp_set_msgfds(chr, NULL, 0);
 remove_fd_in_watch(chr);
+if (s->state == TCP_CHARDEV_STATE_CONNECTING
+|| s->state == TCP_CHARDEV_STATE_CONNECTED) {
+yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
+ yank_generic_iochannel,
+ QIO_CHANNEL(s->sioc));
+}
 object_unref(OBJECT(s->sioc));
 s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
@@ -918,6 +926,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
 }
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 tcp_chr_set_client_ioc_name(chr, sioc);
+yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+   yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 ret = tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 return ret;
@@ -932,6 +943,9 @@ static void tcp_chr_accept(QIONetListener *listener,

 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 tcp_chr_set_client_ioc_name(chr, cioc);
+yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+   yank_generic_iochannel,
+   QIO_CHANNEL(cioc));
 tcp_chr_new_client(chr, cioc);
 }

@@ -947,6 +961,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error 
**errp)
 object_unref(OBJECT(sioc));
 return -1;
 }
+yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+   yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 return 0;
@@ -962,6 +979,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 sioc = qio_net_listener_wait_client(s->listener);
 tcp_chr_set_client_ioc_name(chr, sioc);
+yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+   yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 }
@@ -1072,6 +1092,9 @@ static void char_socket_finalize(Object *obj)
 object_unref(OBJECT(s->tls_creds));
 }
 g_free(s->tls_authz);
+if (s->registered_yank) {
+yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label));
+}

 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1087,6 +1110,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void 
*opaque)

 if (qio_task_propagate_error(task, )) {
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
+ yank_generic_iochannel,
+ QIO_CHANNEL(sioc));
 check_report_connect_error(chr, err);
 goto cleanup;
 }
@@ -1120,6 +1146,9 @@ static void tcp_chr_connect_client_async(Chardev *chr)
 tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
 sioc = qio_channel_socket_new();
 tcp_chr_set_client_ioc_name(chr, sioc);
+yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+   yank_generic_iochannel,
+   QIO_CHANNEL(sioc));
 /*
  * Normally code would use the qio_channel_socket_connect_async
  * method which uses a QIOTask + qio_task_set_error internally
@@ -1362,6 +1391,12 @@ static void qmp_chardev_open_socket(Chardev *chr,
 qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
 }

+yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp);
+if (*errp) {
+return;
+}
+s->registered_yank = true;
+
 /* be isn't opened until we get a connection */
 *be_opened = false;

--
2.20.1



pgpx12P9M6Wj4.pgp
Description: OpenPGP digital signature


[PATCH v10 1/8] Introduce yank feature

2020-10-30 Thread Lukas Straub
The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub 
Acked-by: Stefan Hajnoczi 
---
 include/qemu/yank.h   |  95 +++
 qapi/meson.build  |   1 +
 qapi/qapi-schema.json |   1 +
 qapi/yank.json| 115 ++
 util/meson.build  |   1 +
 util/yank.c   | 216 ++
 6 files changed, 429 insertions(+)
 create mode 100644 include/qemu/yank.h
 create mode 100644 qapi/yank.json
 create mode 100644 util/yank.c

diff --git a/include/qemu/yank.h b/include/qemu/yank.h
new file mode 100644
index 00..96f5b2626f
--- /dev/null
+++ b/include/qemu/yank.h
@@ -0,0 +1,95 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef YANK_H
+#define YANK_H
+
+#include "qapi/qapi-types-yank.h"
+
+typedef void (YankFn)(void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @errp: Error object.
+ */
+void yank_register_instance(const YankInstance *instance, Error **errp);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ */
+void yank_unregister_instance(const YankInstance *instance);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well. See docs/devel/qapi-code-gen.txt under
+ * "An OOB-capable command handler must satisfy the following conditions".
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @func: The yank function.
+ * @opaque: Will be passed to the yank function.
+ */
+void yank_register_function(const YankInstance *instance,
+YankFn *func,
+void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance: The instance.
+ * @func: func that was passed to yank_register_function.
+ * @opaque: opaque that was passed to yank_register_function.
+ */
+void yank_unregister_function(const YankInstance *instance,
+  YankFn *func,
+  void *opaque);
+
+/**
+ * yank_generic_iochannel: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+
+#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
+.type = YANK_INSTANCE_TYPE_BLOCK_NODE, \
+.u.block_node.node_name = (the_node_name) })
+
+#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
+.type = YANK_INSTANCE_TYPE_CHARDEV, \
+.u.chardev.id = (the_id) })
+
+#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
+.type = YANK_INSTANCE_TYPE_MIGRATION })
+
+#endif
diff --git a/qapi/meson.build b/qapi/meson.build
index 0e98146f1f..ab68e7900e 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -47,6 +47,7 @@ qapi_all_modules = [
   'trace',
   'transaction',
   'ui',
+  'yank',
 ]

 qapi_storage_daemon_modules = [
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 0b444b76d2..79c1705ed7 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -91,3 +91,4 @@
 { 'include': 'audio.json' }
 { 'include': 'acpi.json' }
 { 'include': 'pci.json' }
+{ 'include': 'yank.json' }
diff --git a/qapi/yank.json b/qapi/yank.json
new file mode 100644
index 00..1964a2202e
--- /dev/null
+++ b/qapi/yank.json
@@ -0,0 +1,115 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+#
+
+##
+# @YankInstanceType:
+#
+# An enumeration of yank instance types. See @YankInstance for more
+# information.
+#
+# Since: 5.2
+##
+{ 'enum': 'YankInstanceType',
+  'data': [ 'block-node', 'chardev', 'migration' ] }
+
+##
+# @YankInstanceBlockNode:
+#
+# Specifies which block graph node to yank. See @YankInstance for more
+# information.
+#
+# @node-name: the name of the block graph node
+#
+# Since: 5.2
+##
+{ 'struct': 'YankInstanceBlockNode',
+  'data': { 'node-name': 'str' } }
+
+##
+# @YankInstanceChardev:
+#
+# Specifies which 

[PATCH v10 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-10-30 Thread Lukas Straub
Hello Everyone,
So here is v10.
We still need ACKs from NBD and chardev maintainers.

Changes:

v10:
 -moved from qapi/misc.json to qapi/yank.json
 -rename 'blockdev' -> 'block-node'
 -document difference betwen migration yank instance and migrate_cancel
 -better document return values of yank command
 -beter document yank_lock
 -minor style and spelling fixes

v9:
 -rebase onto master
 -implemented new qmp api as proposed by Markus

v8:
 -add Reviewed-by and Acked-by tags
 -rebase onto master
  -minor change to migration
  -convert to meson
 -change "Since:" to 5.2
 -varios code style fixes (Markus Armbruster)
 -point to oob restrictions in comment to yank_register_function
  (Markus Armbruster)
 -improve qmp documentation (Markus Armbruster)
 -document oob suitability of qio_channel and io_shutdown (Markus Armbruster)

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. 
Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Overview:
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

Lukas Straub (8):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document qmp oob suitability of qio_channel_shutdown and
io_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature
  tests/test-char.c: Wait for the chardev to connect in
char_socket_client_dupid_test

 MAINTAINERS   |   7 ++
 block/nbd.c   | 154 ++--
 chardev/char-socket.c |  35 ++
 include/io/channel.h  |   5 +-
 include/qemu/yank.h   |  95 +++
 io/channel-tls.c  |   6 +-
 migration/channel.c   |  13 ++
 migration/migration.c |  25 
 migration/multifd.c   |  10 ++
 migration/qemu-file-channel.c |   7 ++
 migration/savevm.c|   6 +
 qapi/meson.build  |   1 +
 qapi/qapi-schema.json |   1 +
 qapi/yank.json| 115 ++
 tests/test-char.c |   1 +
 util/meson.build  |   1 +
 util/yank.c   | 216 ++
 17 files changed, 634 insertions(+), 64 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 qapi/yank.json
 create mode 100644 util/yank.c

--
2.20.1


pgpxQ7haaweP2.pgp
Description: OpenPGP digital signature


Re: [PULL 0/5] Block layer patches

2020-10-30 Thread Peter Maydell
On Tue, 27 Oct 2020 at 15:15, Kevin Wolf  wrote:
>
> The following changes since commit d55450df995d6223486db11c66491cbf6c131523:
>
>   Merge remote-tracking branch 
> 'remotes/dgilbert/tags/pull-migration-20201026a' into staging (2020-10-27 
> 10:25:42 +)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 1a6d3bd229d429879a85a9105fb84cae049d083c:
>
>   block: End quiescent sections when a BDS is deleted (2020-10-27 15:26:20 
> +0100)
>
> 
> Block layer patches:
>
> - qcow2: Skip copy-on-write when allocating a zero cluster
> - qemu-img: add support for rate limit in qemu-img convert/commit
> - Fix deadlock when deleting a block node during drain_all


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM



Re: [PATCH v9 1/8] Introduce yank feature

2020-10-30 Thread Lukas Straub
On Fri, 30 Oct 2020 15:02:09 +0100
Markus Armbruster  wrote:

> Lukas Straub  writes:
> 
> > On Thu, 29 Oct 2020 17:36:14 +0100
> > Markus Armbruster  wrote:
> >  
> >> Nothing major, looks almost ready to me.
> >> 
> >> Lukas Straub  writes:
> >>   
> >> > The yank feature allows to recover from hanging qemu by "yanking"
> >> > at various parts. Other qemu systems can register themselves and
> >> > multiple yank functions. Then all yank functions for selected
> >> > instances can be called by the 'yank' out-of-band qmp command.
> >> > Available instances can be queried by a 'query-yank' oob command.
> >> >
> >> > Signed-off-by: Lukas Straub 
> >> > Acked-by: Stefan Hajnoczi 
> >> > ---
> >> >  include/qemu/yank.h |  95 
> >> >  qapi/misc.json  | 106 ++
> >> >  util/meson.build|   1 +
> >> >  util/yank.c | 213   
> >> >   
> >> 
> >> checkpatch.pl warns:
> >> 
> >> WARNING: added, moved or deleted file(s), does MAINTAINERS need 
> >> updating?
> >> 
> >> Can we find a maintainer for the two new files?  
> >
> > Yes, I'm maintaining this for now, see patch 7.  
> 
> Thanks!  Would it make sense to add the yank stuff to a new QAPI module
> yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
> it?

Yes, makes sense. Changed for the next version.

> [...]
> >> > diff --git a/qapi/misc.json b/qapi/misc.json
> >> > index 40df513856..3b7de02a4d 100644
> >> > --- a/qapi/misc.json
> >> > +++ b/qapi/misc.json  
> [...]
> >> > +##
> >> > +# @YankInstance:
> >> > +#
> >> > +# A yank instance can be yanked with the "yank" qmp command to recover 
> >> > from a
> >> > +# hanging qemu.
> >> 
> >> QEMU
> >>   
> >> > +#
> >> > +# Currently implemented yank instances:
> >> > +#  -nbd block device:
> >> > +#   Yanking it will shutdown the connection to the nbd server without
> >> > +#   attempting to reconnect.
> >> > +#  -socket chardev:
> >> > +#   Yanking it will shutdown the connected socket.
> >> > +#  -migration:
> >> > +#   Yanking it will shutdown all migration connections.
> >> 
> >> To my surprise, this is recognized as bullet list markup.  But please
> >> put a space between the bullet and the text anyway.
> >> 
> >> Also: "shutdown" is a noun, the verb is spelled "shut down".  
> >
> > Both changed for the next version.
> >  
> >> In my review of v8, I asked how yanking migration is related to command
> >> migrate_cancel.  Daniel explained:
> >> 
> >> migrate_cancel will do a shutdown() on the primary migration socket 
> >> only.
> >> In addition it will toggle the migration state.
> >> 
> >> Yanking will do a shutdown on all migration sockets (important for
> >> multifd), but won't touch migration state or any other aspect of QEMU
> >> code.
> >> 
> >> Overall yanking has less potential for things to go wrong than the
> >> migrate_cancel method, as it doesn't try to do any kind of cleanup
> >> or migration.
> >> 
> >> Would it make sense to work this into the documentation?  
> >
> > How about this?
> >
> >   - migration:
> > Yanking it will shut down all migration connections. Unlike
> > @migrate_cancel, it will not notify the migration process,
> > so migration will go into @failed state, instead of @cancelled
> > state.  
> 
> Works for me.  Advice on when to use it rather than migrate_cancel would
> be nice, though.

Ok, Changed for the next version.

> >> > +#
> >> > +# Since: 5.2
> >> > +##
> >> > +{ 'union': 'YankInstance',
> >> > +  'base': { 'type': 'YankInstanceType' },
> >> > +  'discriminator': 'type',
> >> > +  'data': {
> >> > +  'blockdev': 'YankInstanceBlockdev',
> >> > +  'chardev': 'YankInstanceChardev' } }
> >> > +
> >> > +##
> >> > +# @yank:
> >> > +#
> >> > +# Recover from hanging qemu by yanking the specified instances. See
> >> 
> >> QEMU
> >> 
> >> "Try to recover" would be more precise, I think.  
> >
> > Changed for the next version.
> >  
> >> > +# "YankInstance" for more information.
> >> > +#
> >> > +# Takes a list of @YankInstance as argument.
> >> > +#
> >> > +# Returns: nothing.
> >> > +#
> >> > +# Example:
> >> > +#
> >> > +# -> { "execute": "yank",
> >> > +#  "arguments": {
> >> > +#  "instances": [
> >> > +#   { "type": "block-node",
> >> > +# "node-name": "nbd0" }
> >> > +#  ] } }
> >> > +# <- { "return": {} }
> >> > +#
> >> > +# Since: 5.2
> >> > +##
> >> > +{ 'command': 'yank',
> >> > +  'data': { 'instances': ['YankInstance'] },
> >> > +  'allow-oob': true }
> >> > +  
> [...]
> >> > diff --git a/util/yank.c b/util/yank.c
> >> > new file mode 100644
> >> > index 00..0b3a816706
> >> > --- /dev/null
> >> > +++ b/util/yank.c  
> [...]
> >> > +void qmp_yank(YankInstanceList *instances,
> >> > +  Error **errp)
> >> > +{
> >> > +YankInstanceList *tail;
> >> > +YankInstanceEntry *entry;
> >> > +YankFuncAndParam 

Re: [PATCH 13/25] block/nvme: Introduce Completion Queue definitions

2020-10-30 Thread Philippe Mathieu-Daudé
On 10/30/20 12:46 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 29, 2020 at 10:02:37AM +0100, Philippe Mathieu-Daudé wrote:
>> On 10/28/20 7:24 PM, Philippe Mathieu-Daudé wrote:
>>> On 10/28/20 4:16 PM, Stefan Hajnoczi wrote:
 On Tue, Oct 27, 2020 at 02:55:35PM +0100, Philippe Mathieu-Daudé wrote:
> Rename Submission Queue flags with 'Sq' and introduce
> Completion Queue flag definitions.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/block/nvme.h | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 65e68a82c89..079f884a2d3 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -491,6 +491,11 @@ typedef struct QEMU_PACKED NvmeCreateCq {
>  #define NVME_CQ_FLAGS_PC(cq_flags)  (cq_flags & 0x1)
>  #define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1)
>  
> +enum NvmeFlagsCq {
> +NVME_CQ_PC  = 1,
> +NVME_CQ_IEN = 2,
> +};
> +
>  typedef struct QEMU_PACKED NvmeCreateSq {
>  uint8_t opcode;
>  uint8_t flags;
> @@ -508,12 +513,12 @@ typedef struct QEMU_PACKED NvmeCreateSq {
>  #define NVME_SQ_FLAGS_PC(sq_flags)  (sq_flags & 0x1)
>  #define NVME_SQ_FLAGS_QPRIO(sq_flags)   ((sq_flags >> 1) & 0x3)
>  
> -enum NvmeQueueFlags {
> -NVME_Q_PC   = 1,
> -NVME_Q_PRIO_URGENT  = 0,
> -NVME_Q_PRIO_HIGH= 1,
> -NVME_Q_PRIO_NORMAL  = 2,
> -NVME_Q_PRIO_LOW = 3,
> +enum NvmeFlagsSq {
> +NVME_SQ_PC  = 1,
> +NVME_SQ_PRIO_URGENT = 0,
> +NVME_SQ_PRIO_HIGH   = 1,
> +NVME_SQ_PRIO_NORMAL = 2,
> +NVME_SQ_PRIO_LOW= 3,
>  };

 There is also:

   #define NVME_SQ_FLAGS_PC(sq_flags)  (sq_flags & 0x1)
   #define NVME_SQ_FLAGS_QPRIO(sq_flags)   ((sq_flags >> 1) & 0x3)

 These macros should use the new constants.
>>
>> SQ_PC is bit#0, NVME_SQ_PC is "bit SQ_PC set (PC enabled)",
>> SQ_PRIO are bits #1-2 (shift by 1, mask 2 bits),
>> NVME_SQ_PRIO_xxx is the enum of these 2 bits.
>>
>> The NVME_SQ_FLAGS_X() macros extract the flags.
>>
>> So the macros can not use the new constants.
> 
> I'm not sure I understand. Does this mean the header only defines the
> flag values but not the bit shift constants?

Yes.

> 
> It seems like hw/block/nvme.c and block/nvme.c are expressing flags in
> slightly different approaches. Can they be unified instead of
> introducing hw/block/nvme.c- and block/nvme.c-only constants in the
> shared header file?

I suggested the hw/block/nvme.c to unify the style.
Klaus agreed (at least to have a look). Any change will
be for 6.0 anyway.




Re: [PATCH-for-6.0 v2 25/25] block/nvme: Simplify Completion Queue Command Identifier field use

2020-10-30 Thread Philippe Mathieu-Daudé
On 10/30/20 3:00 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 29, 2020 at 10:33:06AM +0100, Philippe Mathieu-Daudé wrote:
>> The "Completion Queue Entry: DW 2" describes it as:
>>
>>   This identifier is assigned by host software when
>>   the command is submitted to the Submission
>>
>> As the is just an opaque cookie, it is pointless to byte-swap it.
>>
>> Suggested-by: Keith Busch 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  block/nvme.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index a06a188d530..e7723c42a6d 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -344,7 +344,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
>>  trace_nvme_error(le32_to_cpu(c->result),
>>   le16_to_cpu(c->sq_head),
>>   le16_to_cpu(c->sq_id),
>> - le16_to_cpu(c->cid),
>> + c->cid,
>>   le16_to_cpu(status));
>>  }
>>  switch (status) {
>> @@ -401,7 +401,7 @@ static bool nvme_process_completion(NVMeQueuePair *q)
>>  if (!q->cq.head) {
>>  q->cq_phase = !q->cq_phase;
>>  }
>> -cid = le16_to_cpu(c->cid);
>> +cid = c->cid;
>>  if (cid == 0 || cid > NVME_QUEUE_SIZE) {
>>  warn_report("NVMe: Unexpected CID in completion queue: 
>> %"PRIu32", "
>>  "queue size: %u", cid, NVME_QUEUE_SIZE);
>> @@ -469,7 +469,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
>> NVMeRequest *req,
>>  assert(!req->cb);
>>  req->cb = cb;
>>  req->opaque = opaque;
>> -cmd->cid = cpu_to_le16(req->cid);
>> +cmd->cid = req->cid;
>>  
>>  trace_nvme_submit_command(q->s, q->index, req->cid);
>>  nvme_trace_command(cmd);
> 
> Eliminating the byteswap is safe but this patch makes the code
> confusing, as I mentioned previously.
> 
> Please use a comment or macro to mark this field native endian. It's not
> obvious to the reader that we can skip the byteswap here.
> 
> Otherwise it will confuse readers into adding the byteswap back, not
> using byteswapping in other places where it is needed, etc.

OK. (This patch is for 6.0 anyway, I included because it was
following the previous patch in its previous version).

> 
> Thanks,
> Stefan
> 




Re: [PATCH-for-5.2 v2 12/25] block/nvme: Introduce Completion Queue definitions

2020-10-30 Thread Philippe Mathieu-Daudé
On 10/30/20 3:03 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 29, 2020 at 10:32:53AM +0100, Philippe Mathieu-Daudé wrote:
>> Rename Submission Queue flags with 'Sq' to differentiate
>> submission queue flags from command queue flags, and introduce
>> Completion Queue flag definitions.
>>
>> Reviewed-by: Eric Auger 
>> Tested-by: Eric Auger 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  include/block/nvme.h | 18 --
>>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> I mentioned more possible cleanups in the previous revision, but they
> are not a blocker:

Your cleanups haven't been ignored, simply postponed :)

> 
> Reviewed-by: Stefan Hajnoczi 
>

Thanks!




Re: [PULL 0/3] Ide patches

2020-10-30 Thread Peter Maydell
On Tue, 27 Oct 2020 at 14:43, John Snow  wrote:
>
> The following changes since commit a95e0396c805735c491a049b01de6f5a713fb91b:
>
>   Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' 
> into staging (2020-10-26 15:49:11 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/jsnow/qemu.git tags/ide-pull-request
>
> for you to fetch changes up to 1a9925e3390b6adf1125e3abaa17c80ca012bede:
>
>   ide: clear SRST after SRST finishes (2020-10-27 10:39:06 -0400)
>
> 
> IDE Pull request
>
> 
>
> John Snow (3):
>   ide: run diagnostic after SRST
>   ide: perform SRST as early as possible
>   ide: clear SRST after SRST finishes


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM



Re: [PATCH-for-5.2 v2 16/25] block/nvme: Simplify nvme_cmd_sync()

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 10:32:57AM +0100, Philippe Mathieu-Daudé wrote:
> As all commands use the ADMIN queue, it is pointless to pass
> it as argument each time. Remove the argument, and rename the
> function as nvme_admin_cmd_sync() to make this new behavior
> clearer.
> 
> Reviewed-by: Eric Auger 
> Tested-by: Eric Auger 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PULL 13/15] vhost-blk: set features before setting inflight feature

2020-10-30 Thread Michael S. Tsirkin
From: Jin Yu 

Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.

Signed-off-by: Jin Yu 
Message-Id: <20200910134851.7817-1-jin...@intel.com>
Acked-by: Raphael Norwitz 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost.h |  1 +
 hw/block/vhost-user-blk.c |  6 ++
 hw/virtio/vhost.c | 18 ++
 3 files changed, 25 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 94585067f7..839bfb153c 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -141,6 +141,7 @@ void vhost_dev_reset_inflight(struct vhost_inflight 
*inflight);
 void vhost_dev_free_inflight(struct vhost_inflight *inflight);
 void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f);
 int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f);
+int vhost_dev_prepare_inflight(struct vhost_dev *hdev);
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight);
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index a076b1e54d..f67b29bbf3 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -131,6 +131,12 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
 
 s->dev.acked_features = vdev->guest_features;
 
+ret = vhost_dev_prepare_inflight(>dev);
+if (ret < 0) {
+error_report("Error set inflight format: %d", -ret);
+goto err_guest_notifiers;
+}
+
 if (!s->inflight->addr) {
 ret = vhost_dev_get_inflight(>dev, s->queue_size, s->inflight);
 if (ret < 0) {
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 79b2be20df..f2482378c6 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1645,6 +1645,24 @@ int vhost_dev_load_inflight(struct vhost_inflight 
*inflight, QEMUFile *f)
 return 0;
 }
 
+int vhost_dev_prepare_inflight(struct vhost_dev *hdev)
+{
+int r;
+ 
+if (hdev->vhost_ops->vhost_get_inflight_fd == NULL ||
+hdev->vhost_ops->vhost_set_inflight_fd == NULL) {
+return 0;
+}
+ 
+r = vhost_dev_set_features(hdev, hdev->log_enabled);
+if (r < 0) {
+VHOST_OPS_DEBUG("vhost_dev_prepare_inflight failed");
+return r;
+}
+
+return 0;
+}
+
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight)
 {
-- 
MST




Re: [PATCH-for-5.2 v2 12/25] block/nvme: Introduce Completion Queue definitions

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 10:32:53AM +0100, Philippe Mathieu-Daudé wrote:
> Rename Submission Queue flags with 'Sq' to differentiate
> submission queue flags from command queue flags, and introduce
> Completion Queue flag definitions.
> 
> Reviewed-by: Eric Auger 
> Tested-by: Eric Auger 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/block/nvme.h | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)

I mentioned more possible cleanups in the previous revision, but they
are not a blocker:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH-for-5.2 v2 10/25] block/nvme: Make nvme_identify() return boolean indicating error

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 10:32:51AM +0100, Philippe Mathieu-Daudé wrote:
> Just for consistency, following the example documented since
> commit e3fe3988d7 ("error: Document Error API usage rules"),
> return a boolean value indicating an error is set or not.
> Directly pass errp as the local_err is not requested in our
> case.
> 
> Tested-by: Eric Auger 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v9 1/8] Introduce yank feature

2020-10-30 Thread Markus Armbruster
Lukas Straub  writes:

> On Thu, 29 Oct 2020 17:36:14 +0100
> Markus Armbruster  wrote:
>
>> Nothing major, looks almost ready to me.
>> 
>> Lukas Straub  writes:
>> 
>> > The yank feature allows to recover from hanging qemu by "yanking"
>> > at various parts. Other qemu systems can register themselves and
>> > multiple yank functions. Then all yank functions for selected
>> > instances can be called by the 'yank' out-of-band qmp command.
>> > Available instances can be queried by a 'query-yank' oob command.
>> >
>> > Signed-off-by: Lukas Straub 
>> > Acked-by: Stefan Hajnoczi 
>> > ---
>> >  include/qemu/yank.h |  95 
>> >  qapi/misc.json  | 106 ++
>> >  util/meson.build|   1 +
>> >  util/yank.c | 213   
>> 
>> checkpatch.pl warns:
>> 
>> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
>> 
>> Can we find a maintainer for the two new files?
>
> Yes, I'm maintaining this for now, see patch 7.

Thanks!  Would it make sense to add the yank stuff to a new QAPI module
yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
it?

[...]
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 40df513856..3b7de02a4d 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
[...]
>> > +##
>> > +# @YankInstance:
>> > +#
>> > +# A yank instance can be yanked with the "yank" qmp command to recover 
>> > from a
>> > +# hanging qemu.  
>> 
>> QEMU
>> 
>> > +#
>> > +# Currently implemented yank instances:
>> > +#  -nbd block device:
>> > +#   Yanking it will shutdown the connection to the nbd server without
>> > +#   attempting to reconnect.
>> > +#  -socket chardev:
>> > +#   Yanking it will shutdown the connected socket.
>> > +#  -migration:
>> > +#   Yanking it will shutdown all migration connections.  
>> 
>> To my surprise, this is recognized as bullet list markup.  But please
>> put a space between the bullet and the text anyway.
>> 
>> Also: "shutdown" is a noun, the verb is spelled "shut down".
>
> Both changed for the next version.
>
>> In my review of v8, I asked how yanking migration is related to command
>> migrate_cancel.  Daniel explained:
>> 
>> migrate_cancel will do a shutdown() on the primary migration socket only.
>> In addition it will toggle the migration state.
>> 
>> Yanking will do a shutdown on all migration sockets (important for
>> multifd), but won't touch migration state or any other aspect of QEMU
>> code.
>> 
>> Overall yanking has less potential for things to go wrong than the
>> migrate_cancel method, as it doesn't try to do any kind of cleanup
>> or migration.
>> 
>> Would it make sense to work this into the documentation?
>
> How about this?
>
>   - migration:
> Yanking it will shut down all migration connections. Unlike
> @migrate_cancel, it will not notify the migration process,
> so migration will go into @failed state, instead of @cancelled
> state.

Works for me.  Advice on when to use it rather than migrate_cancel would
be nice, though.

>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'union': 'YankInstance',
>> > +  'base': { 'type': 'YankInstanceType' },
>> > +  'discriminator': 'type',
>> > +  'data': {
>> > +  'blockdev': 'YankInstanceBlockdev',
>> > +  'chardev': 'YankInstanceChardev' } }
>> > +
>> > +##
>> > +# @yank:
>> > +#
>> > +# Recover from hanging qemu by yanking the specified instances. See  
>> 
>> QEMU
>> 
>> "Try to recover" would be more precise, I think.
>
> Changed for the next version.
>
>> > +# "YankInstance" for more information.
>> > +#
>> > +# Takes a list of @YankInstance as argument.
>> > +#
>> > +# Returns: nothing.
>> > +#
>> > +# Example:
>> > +#
>> > +# -> { "execute": "yank",
>> > +#  "arguments": {
>> > +#  "instances": [
>> > +#   { "type": "block-node",
>> > +# "node-name": "nbd0" }
>> > +#  ] } }
>> > +# <- { "return": {} }
>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'command': 'yank',
>> > +  'data': { 'instances': ['YankInstance'] },
>> > +  'allow-oob': true }
>> > +
[...]
>> > diff --git a/util/yank.c b/util/yank.c
>> > new file mode 100644
>> > index 00..0b3a816706
>> > --- /dev/null
>> > +++ b/util/yank.c
[...]
>> > +void qmp_yank(YankInstanceList *instances,
>> > +  Error **errp)
>> > +{
>> > +YankInstanceList *tail;
>> > +YankInstanceEntry *entry;
>> > +YankFuncAndParam *func_entry;
>> > +
>> > +qemu_mutex_lock(_lock);
>> > +for (tail = instances; tail; tail = tail->next) {
>> > +entry = yank_find_entry(tail->value);
>> > +if (!entry) {
>> > +error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not 
>> > found");  
>> 
>> Quote error.h:
>> 
>>  * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>>  * strongly discouraged.
>> 
>> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
>> 

Re: [PATCH-for-6.0 v2 25/25] block/nvme: Simplify Completion Queue Command Identifier field use

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 10:33:06AM +0100, Philippe Mathieu-Daudé wrote:
> The "Completion Queue Entry: DW 2" describes it as:
> 
>   This identifier is assigned by host software when
>   the command is submitted to the Submission
> 
> As the is just an opaque cookie, it is pointless to byte-swap it.
> 
> Suggested-by: Keith Busch 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index a06a188d530..e7723c42a6d 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -344,7 +344,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
>  trace_nvme_error(le32_to_cpu(c->result),
>   le16_to_cpu(c->sq_head),
>   le16_to_cpu(c->sq_id),
> - le16_to_cpu(c->cid),
> + c->cid,
>   le16_to_cpu(status));
>  }
>  switch (status) {
> @@ -401,7 +401,7 @@ static bool nvme_process_completion(NVMeQueuePair *q)
>  if (!q->cq.head) {
>  q->cq_phase = !q->cq_phase;
>  }
> -cid = le16_to_cpu(c->cid);
> +cid = c->cid;
>  if (cid == 0 || cid > NVME_QUEUE_SIZE) {
>  warn_report("NVMe: Unexpected CID in completion queue: 
> %"PRIu32", "
>  "queue size: %u", cid, NVME_QUEUE_SIZE);
> @@ -469,7 +469,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
> NVMeRequest *req,
>  assert(!req->cb);
>  req->cb = cb;
>  req->opaque = opaque;
> -cmd->cid = cpu_to_le16(req->cid);
> +cmd->cid = req->cid;
>  
>  trace_nvme_submit_command(q->s, q->index, req->cid);
>  nvme_trace_command(cmd);

Eliminating the byteswap is safe but this patch makes the code
confusing, as I mentioned previously.

Please use a comment or macro to mark this field native endian. It's not
obvious to the reader that we can skip the byteswap here.

Otherwise it will confuse readers into adding the byteswap back, not
using byteswapping in other places where it is needed, etc.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH-for-5.2 v2 24/25] block/nvme: Fix nvme_submit_command() on big-endian host

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 10:33:05AM +0100, Philippe Mathieu-Daudé wrote:
> The Completion Queue Command Identifier is a 16-bit value,
> so nvme_submit_command() is unlikely to work on big-endian
> hosts, as the relevant bits are truncated.
> Fix by using the correct byte-swap function.
> 
> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> Reported-by: Keith Busch 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v3 0/9] block-backend: Introduce I/O hang

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
> 
> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> > On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> >> A VM in the cloud environment may use a virutal disk as the backend 
> >> storage,
> >> and there are usually filesystems on the virtual block device. When backend
> >> storage is temporarily down, any I/O issued to the virtual block device 
> >> will
> >> cause an error. For example, an error occurred in ext4 filesystem would 
> >> make
> >> the filesystem readonly. However a cloud backend storage can be soon 
> >> recovered.
> >> For example, an IP-SAN may be down due to network failure and will be 
> >> online
> >> soon after network is recovered. The error in the filesystem may not be
> >> recovered unless a device reattach or system restart. So an I/O rehandle is
> >> in need to implement a self-healing mechanism.
> >>
> >> This patch series propose a feature called I/O hang. It can rehandle AIOs
> >> with EIO error without sending error back to guest. From guest's 
> >> perspective
> >> of view it is just like an IO is hanging and not returned. Guest can get
> >> back running smoothly when I/O is recovred with this feature enabled.
> > 
> > Hi,
> > This feature seems like an extension of the existing -drive
> > rerror=/werror= parameters:
> > 
> >   werror=action,rerror=action
> >   Specify which action to take on write and read errors. Valid
> >   actions are: "ignore" (ignore the error and try to continue),
> >   "stop" (pause QEMU), "report" (report the error to the guest),
> >   "enospc" (pause QEMU only if the host disk is full; report the
> >   error to the guest otherwise).  The default setting is
> >   werror=enospc and rerror=report.
> > 
> > That mechanism already has a list of requests to retry and live
> > migration integration. Using the werror=/rerror= mechanism would avoid
> > code duplication between these features. You could add a
> > werror/rerror=retry error action for this feature.
> > 
> > Does that sound good?
> > 
> > Stefan
> > 
> 
> Hi Stefan,
> 
> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
> way for the retry feature.
> 
> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
> provides ACTION, and the real handler of errors need be implemented several
> times in device layer for different devices. While our I/O Hang mechanism
> directly handles AIO errors no matter which type of devices it is. Is it a
> more common way to implement the feature in block-backend layer? Especially we
> can set retry timeout in a common structure BlockBackend.
> 
> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
> in device layer rather than in block-backend layer?

Yes, it's because failed requests can be live-migrated and retried on
the destination host. In other words, live migration still works even
when there are failed requests.

There may be things that can be refactored so there is less duplication
in devices, but the basic design goal is that the block layer doesn't
keep track of failed requests because they are live migrated together
with the device state.

Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.

Stefan


signature.asc
Description: PGP signature


[PULL 13/15] vhost-blk: set features before setting inflight feature

2020-10-30 Thread Michael S. Tsirkin
From: Jin Yu 

Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.

Signed-off-by: Jin Yu 
Message-Id: <20200910134851.7817-1-jin...@intel.com>
Acked-by: Raphael Norwitz 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost.h |  1 +
 hw/block/vhost-user-blk.c |  6 ++
 hw/virtio/vhost.c | 18 ++
 3 files changed, 25 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 94585067f7..839bfb153c 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -141,6 +141,7 @@ void vhost_dev_reset_inflight(struct vhost_inflight 
*inflight);
 void vhost_dev_free_inflight(struct vhost_inflight *inflight);
 void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f);
 int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f);
+int vhost_dev_prepare_inflight(struct vhost_dev *hdev);
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight);
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index a076b1e54d..f67b29bbf3 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -131,6 +131,12 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
 
 s->dev.acked_features = vdev->guest_features;
 
+ret = vhost_dev_prepare_inflight(>dev);
+if (ret < 0) {
+error_report("Error set inflight format: %d", -ret);
+goto err_guest_notifiers;
+}
+
 if (!s->inflight->addr) {
 ret = vhost_dev_get_inflight(>dev, s->queue_size, s->inflight);
 if (ret < 0) {
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 79b2be20df..f2482378c6 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1645,6 +1645,24 @@ int vhost_dev_load_inflight(struct vhost_inflight 
*inflight, QEMUFile *f)
 return 0;
 }
 
+int vhost_dev_prepare_inflight(struct vhost_dev *hdev)
+{
+int r;
+ 
+if (hdev->vhost_ops->vhost_get_inflight_fd == NULL ||
+hdev->vhost_ops->vhost_set_inflight_fd == NULL) {
+return 0;
+}
+ 
+r = vhost_dev_set_features(hdev, hdev->log_enabled);
+if (r < 0) {
+VHOST_OPS_DEBUG("vhost_dev_prepare_inflight failed");
+return r;
+}
+
+return 0;
+}
+
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight)
 {
-- 
MST




Re: [PATCH 00/12] block/export: vhost-user-blk server cleanups and tests

2020-10-30 Thread Michael S. Tsirkin
On Tue, Oct 27, 2020 at 05:35:16PM +, Stefan Hajnoczi wrote:
> This patch series solves some issues with the new vhost-user-blk-server and
> adds the qtest test case. The test case was not included in the pull request
> that introduced the vhost-user-blk server because of reliability issues that
> are fixed in this patch series.


Fails make check for me:

Running test qtest-i386/qos-test
Broken pipe
../qemu/tests/qtest/libqtest.c:161: kill_qemu() detected QEMU death from signal 
11 (Segmentation fault) (core dumped)
ERROR qtest-i386/qos-test - too few tests run (expected 92, got 65)
make: *** [Makefile.mtest:1857: run-test-230] Error 1


> Coiby Xu (1):
>   test: new qTest case to test the vhost-user-blk-server
> 
> Stefan Hajnoczi (11):
>   libvhost-user: follow QEMU comment style
>   configure: introduce --enable-vhost-user-blk-server
>   block/export: make vhost-user-blk config space little-endian
>   block/export: fix vhost-user-blk get_config() information leak
>   contrib/vhost-user-blk: fix get_config() information leak
>   tests/qtest: add multi-queue test case to vhost-user-blk-test
>   libqtest: add qtest_socket_server()
>   vhost-user-blk-test: rename destroy_drive() to destroy_file()
>   vhost-user-blk-test: close fork child file descriptors
>   vhost-user-blk-test: drop unused return value
>   vhost-user-blk-test: fix races by using fd passing
> 
>  configure   |  15 +
>  contrib/libvhost-user/libvhost-user.h   |  15 +-
>  tests/qtest/libqos/libqtest.h   |  25 +
>  tests/qtest/libqos/vhost-user-blk.h |  48 ++
>  block/export/export.c   |   4 +-
>  block/export/vhost-user-blk-server.c|  28 +-
>  contrib/vhost-user-blk/vhost-user-blk.c |   2 +
>  tests/qtest/libqos/vhost-user-blk.c | 129 
>  tests/qtest/libqtest.c  |  76 ++-
>  tests/qtest/vhost-user-blk-test.c   | 843 
>  block/export/meson.build|   2 +-
>  tests/qtest/libqos/meson.build  |   1 +
>  tests/qtest/meson.build |   2 +
>  util/meson.build|   2 +-
>  14 files changed, 1151 insertions(+), 41 deletions(-)
>  create mode 100644 tests/qtest/libqos/vhost-user-blk.h
>  create mode 100644 tests/qtest/libqos/vhost-user-blk.c
>  create mode 100644 tests/qtest/vhost-user-blk-test.c
> 
> -- 
> 2.26.2
> 




Re: [PATCH 13/25] block/nvme: Introduce Completion Queue definitions

2020-10-30 Thread Stefan Hajnoczi
On Thu, Oct 29, 2020 at 10:02:37AM +0100, Philippe Mathieu-Daudé wrote:
> On 10/28/20 7:24 PM, Philippe Mathieu-Daudé wrote:
> > On 10/28/20 4:16 PM, Stefan Hajnoczi wrote:
> >> On Tue, Oct 27, 2020 at 02:55:35PM +0100, Philippe Mathieu-Daudé wrote:
> >>> Rename Submission Queue flags with 'Sq' and introduce
> >>> Completion Queue flag definitions.
> >>>
> >>> Signed-off-by: Philippe Mathieu-Daudé 
> >>> ---
> >>>  include/block/nvme.h | 17 +++--
> >>>  1 file changed, 11 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/include/block/nvme.h b/include/block/nvme.h
> >>> index 65e68a82c89..079f884a2d3 100644
> >>> --- a/include/block/nvme.h
> >>> +++ b/include/block/nvme.h
> >>> @@ -491,6 +491,11 @@ typedef struct QEMU_PACKED NvmeCreateCq {
> >>>  #define NVME_CQ_FLAGS_PC(cq_flags)  (cq_flags & 0x1)
> >>>  #define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1)
> >>>  
> >>> +enum NvmeFlagsCq {
> >>> +NVME_CQ_PC  = 1,
> >>> +NVME_CQ_IEN = 2,
> >>> +};
> >>> +
> >>>  typedef struct QEMU_PACKED NvmeCreateSq {
> >>>  uint8_t opcode;
> >>>  uint8_t flags;
> >>> @@ -508,12 +513,12 @@ typedef struct QEMU_PACKED NvmeCreateSq {
> >>>  #define NVME_SQ_FLAGS_PC(sq_flags)  (sq_flags & 0x1)
> >>>  #define NVME_SQ_FLAGS_QPRIO(sq_flags)   ((sq_flags >> 1) & 0x3)
> >>>  
> >>> -enum NvmeQueueFlags {
> >>> -NVME_Q_PC   = 1,
> >>> -NVME_Q_PRIO_URGENT  = 0,
> >>> -NVME_Q_PRIO_HIGH= 1,
> >>> -NVME_Q_PRIO_NORMAL  = 2,
> >>> -NVME_Q_PRIO_LOW = 3,
> >>> +enum NvmeFlagsSq {
> >>> +NVME_SQ_PC  = 1,
> >>> +NVME_SQ_PRIO_URGENT = 0,
> >>> +NVME_SQ_PRIO_HIGH   = 1,
> >>> +NVME_SQ_PRIO_NORMAL = 2,
> >>> +NVME_SQ_PRIO_LOW= 3,
> >>>  };
> >>
> >> There is also:
> >>
> >>   #define NVME_SQ_FLAGS_PC(sq_flags)  (sq_flags & 0x1)
> >>   #define NVME_SQ_FLAGS_QPRIO(sq_flags)   ((sq_flags >> 1) & 0x3)
> >>
> >> These macros should use the new constants.
> 
> SQ_PC is bit#0, NVME_SQ_PC is "bit SQ_PC set (PC enabled)",
> SQ_PRIO are bits #1-2 (shift by 1, mask 2 bits),
> NVME_SQ_PRIO_xxx is the enum of these 2 bits.
> 
> The NVME_SQ_FLAGS_X() macros extract the flags.
> 
> So the macros can not use the new constants.

I'm not sure I understand. Does this mean the header only defines the
flag values but not the bit shift constants?

It seems like hw/block/nvme.c and block/nvme.c are expressing flags in
slightly different approaches. Can they be unified instead of
introducing hw/block/nvme.c- and block/nvme.c-only constants in the
shared header file?

Stefan


signature.asc
Description: PGP signature


--enable-xen on gitlab CI? (was Re: [PATCH 09/36] qdev: Make qdev_get_prop_ptr() get Object* arg)

2020-10-30 Thread Eduardo Habkost
On Fri, Oct 30, 2020 at 11:29:25AM +0400, Marc-André Lureau wrote:
> On Fri, Oct 30, 2020 at 2:07 AM Eduardo Habkost  wrote:
> 
> > Make the code more generic and not specific to TYPE_DEVICE.
> >
> > Signed-off-by: Eduardo Habkost 
> >
> 
> Nice cleanup!, but fails to build atm
> 
> ../hw/block/xen-block.c:403:9: error: ‘dev’ undeclared (first use in this
> function); did you mean ‘vdev’?
>   403 | if (dev->realized) {

Thanks for catching it!

What is necessary to make sure we have a CONFIG_XEN=y job in
gitlab CI?  Maybe just including xen-devel in some of the
container images is enough?

-- 
Eduardo




Re: [PATCH v9 1/8] Introduce yank feature

2020-10-30 Thread Lukas Straub
On Thu, 29 Oct 2020 17:36:14 +0100
Markus Armbruster  wrote:

> Nothing major, looks almost ready to me.
> 
> Lukas Straub  writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub 
> > Acked-by: Stefan Hajnoczi 
> > ---
> >  include/qemu/yank.h |  95 
> >  qapi/misc.json  | 106 ++
> >  util/meson.build|   1 +
> >  util/yank.c | 213   
> 
> checkpatch.pl warns:
> 
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> 
> Can we find a maintainer for the two new files?

Yes, I'm maintaining this for now, see patch 7.

> >  4 files changed, 415 insertions(+)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> >
> > diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> > new file mode 100644
> > index 00..89755e62af
> > --- /dev/null
> > +++ b/include/qemu/yank.h
> > @@ -0,0 +1,95 @@
> > +/*
> > + * QEMU yank feature
> > + *
> > + * Copyright (c) Lukas Straub 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef YANK_H
> > +#define YANK_H
> > +
> > +#include "qapi/qapi-types-misc.h"
> > +
> > +typedef void (YankFn)(void *opaque);
> > +
> > +/**
> > + * yank_register_instance: Register a new instance.
> > + *
> > + * This registers a new instance for yanking. Must be called before any 
> > yank
> > + * function is registered for this instance.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @errp: Error object.
> > + */
> > +void yank_register_instance(const YankInstance *instance, Error **errp);
> > +
> > +/**
> > + * yank_unregister_instance: Unregister a instance.
> > + *
> > + * This unregisters a instance. Must be called only after every yank 
> > function
> > + * of the instance has been unregistered.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + */
> > +void yank_unregister_instance(const YankInstance *instance);
> > +
> > +/**
> > + * yank_register_function: Register a yank function
> > + *
> > + * This registers a yank function. All limitations of qmp oob commands 
> > apply
> > + * to the yank function as well. See docs/devel/qapi-code-gen.txt under
> > + * "An OOB-capable command handler must satisfy the following conditions".
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @func: The yank function.
> > + * @opaque: Will be passed to the yank function.
> > + */
> > +void yank_register_function(const YankInstance *instance,
> > +YankFn *func,
> > +void *opaque);
> > +
> > +/**
> > + * yank_unregister_function: Unregister a yank function
> > + *
> > + * This unregisters a yank function.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance: The instance.
> > + * @func: func that was passed to yank_register_function.
> > + * @opaque: opaque that was passed to yank_register_function.
> > + */
> > +void yank_unregister_function(const YankInstance *instance,
> > +  YankFn *func,
> > +  void *opaque);
> > +
> > +/**
> > + * yank_generic_iochannel: Generic yank function for iochannel
> > + *
> > + * This is a generic yank function which will call qio_channel_shutdown on 
> > the
> > + * provided QIOChannel.
> > + *
> > + * @opaque: QIOChannel to shutdown
> > + */
> > +void yank_generic_iochannel(void *opaque);
> > +
> > +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \
> > +.type = YANK_INSTANCE_TYPE_BLOCKDEV, \
> > +.u.blockdev.node_name = (the_node_name) })
> > +
> > +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \
> > +.type = YANK_INSTANCE_TYPE_CHARDEV, \
> > +.u.chardev.id = (the_id) })
> > +
> > +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \
> > +.type = YANK_INSTANCE_TYPE_MIGRATION })
> > +
> > +#endif
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 40df513856..3b7de02a4d 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -568,3 +568,109 @@
> >   'data': { '*option': 'str' },
> >   'returns': ['CommandLineOptionInfo'],
> >   'allow-preconfig': true }
> > +
> > +##
> > +# @YankInstanceType:
> > +#
> > +# An enumeration of yank instance types. See "YankInstance" for more  
> 
> Please write cross-references as @YankInstance.  This gives us a chance
> to turn them into links (seems not to be implemented, yet).  More of the

Re: [PATCH 14/36] qdev: Move dev->realized check to qdev_property_set()

2020-10-30 Thread Marc-André Lureau
On Fri, Oct 30, 2020 at 2:10 AM Eduardo Habkost  wrote:

> Every single qdev property setter function manually checks
> dev->realized.  We can just check dev->realized inside
> qdev_property_set() instead.
>
> The check is being added as a separate function
> (qdev_prop_allow_set()) because it will become a callback later.
>
> Signed-off-by: Eduardo Habkost 
>

nice
Reviewed-by: Marc-André Lureau 

-- 
Marc-André Lureau


Re: [PATCH 09/36] qdev: Make qdev_get_prop_ptr() get Object* arg

2020-10-30 Thread Marc-André Lureau
On Fri, Oct 30, 2020 at 11:29 AM Marc-André Lureau <
marcandre.lur...@gmail.com> wrote:

>
>
> On Fri, Oct 30, 2020 at 2:07 AM Eduardo Habkost 
> wrote:
>
>> Make the code more generic and not specific to TYPE_DEVICE.
>>
>> Signed-off-by: Eduardo Habkost 
>>
>
> Nice cleanup!, but fails to build atm
>
> ../hw/block/xen-block.c:403:9: error: ‘dev’ undeclared (first use in this
> function); did you mean ‘vdev’?
>   403 | if (dev->realized) {
>
>
That seems to be the only issue though, so with that fixed:
Reviewed-by: Marc-André Lureau 

-- 
Marc-André Lureau


Re: [PATCH 09/36] qdev: Make qdev_get_prop_ptr() get Object* arg

2020-10-30 Thread Marc-André Lureau
On Fri, Oct 30, 2020 at 2:07 AM Eduardo Habkost  wrote:

> Make the code more generic and not specific to TYPE_DEVICE.
>
> Signed-off-by: Eduardo Habkost 
>

Nice cleanup!, but fails to build atm

../hw/block/xen-block.c:403:9: error: ‘dev’ undeclared (first use in this
function); did you mean ‘vdev’?
  403 | if (dev->realized) {

-- 
Marc-André Lureau


Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-30 Thread Philippe Mathieu-Daudé
Hi Dmitry,

On 10/30/20 3:32 AM, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 
> ---
>  block/nvme.c  |   2 +-
>  hw/block/nvme-ns.c| 173 
>  hw/block/nvme-ns.h|  54 +++
>  hw/block/nvme.c   | 977 +-
>  hw/block/nvme.h   |   8 +
>  hw/block/trace-events |  18 +-
>  include/block/nvme.h  | 113 -

When you start modifying include/ files, it is recommended
to start using scripts/git.orderfile as this makes review
easier (no need to scroll back / up constantly).

As "block/nvme.h" is shared by 2 subsystems, keeping its
changes in a separate patch is preferred.

>  7 files changed, 1322 insertions(+), 23 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 05485fdd11..7a513c9a17 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
>  {
>  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
>  if (status) {
> -trace_nvme_error(le32_to_cpu(c->result),
> +trace_nvme_error(le32_to_cpu(c->result32),
>   le16_to_cpu(c->sq_head),
>   le16_to_cpu(c->sq_id),
>   le16_to_cpu(c->cid),
...

> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 3653b4aefc..ba8a45edf5 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -489,6 +489,9 @@ enum NvmeIoCommands {
>  NVME_CMD_COMPARE= 0x05,
>  NVME_CMD_WRITE_ZEROES   = 0x08,
>  NVME_CMD_DSM= 0x09,
> +NVME_CMD_ZONE_MGMT_SEND = 0x79,
> +NVME_CMD_ZONE_MGMT_RECV = 0x7a,
> +NVME_CMD_ZONE_APPEND= 0x7d,
>  };
>  
>  typedef struct QEMU_PACKED NvmeDeleteQ {
> @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult {
>  } NvmeAerResult;
>  
>  typedef struct QEMU_PACKED NvmeCqe {
> -uint32_tresult;
> -uint32_trsvd;
> +union {
> +uint64_t result64;
> +uint32_t result32;
> +};

When using packed structure you want to define all fields to
avoid alignment confusion (and I'm surprised the compiler doesn't
complain...). So this would be:

   union {
   uint64_t result64;
   struct {
   uint32_tresult32;
   uint32_trsvd32;
   };
   };

But since the ZNS is still a technical proposal and not in the spec,
this doesn't look correct (the spec list this field as 32-bit).

What do you think about adding NvmeCqeZNS?

Maybe:

  typedef struct QEMU_PACKED NvmeCqeZNS {
  uint64_tresult;
  uint16_tsq_head;
  uint16_tsq_id;
  uint16_tcid;
  uint16_tstatus;
  } NvmeCqeZNS;

Or clever:

  typedef union QEMU_PACKED NvmeCqeZNS {
  union {
  struct {
  uint64_t result;
  uint32_t dw2;
  uint32_t dw3;
  };
  NvmeCqe  cqe;
  };
  } NvmeCqeZNS;

I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h".

>  uint16_tsq_head;
>  uint16_tsq_id;
>  uint16_tcid;
> @@ -678,6 +683,7 @@ enum NvmeStatusCodes {
>  NVME_SGL_DESCR_TYPE_INVALID = 

Re: [PATCH v7 0/3] hw/block/nvme: dulbe and dsm support

2020-10-30 Thread Klaus Jensen
On Oct 27 18:57, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This adds support for the Deallocated or Unwritten Logical Block error
> recovery feature as well as the Dataset Management command.
> 
> v7:
>   - Handle negative return value from bdrv_block_status.
>   - bdrv_get_info may not be supported on all block drivers, so do not
> consider it a fatal error.
> 
> v6:
>   - Skip the allocation of the discards integer and just use the opaque
> value directly (Philippe)
>   - Split changes to include/block/nvme.h into a separate patch
> (Philippe)
>   - Clean up some convoluted checks on the discards value (Philippe)
>   - Use unambiguous units in the commit messages (Philippe)
>   - Stack allocate the range array (Keith)
> 
> v5:
>   - Restore status code from callback (Keith)
> 
> v4:
>   - Removed mixed declaration and code (Keith)
>   - Set NPDG and NPDA and account for the blockdev cluster size.
> 
> Klaus Jensen (3):
>   hw/block/nvme: add dulbe support
>   nvme: add namespace I/O optimization fields to shared header
>   hw/block/nvme: add the dataset management command
> 
>  hw/block/nvme-ns.h|   4 +
>  hw/block/nvme.h   |   2 +
>  include/block/nvme.h  |  12 ++-
>  hw/block/nvme-ns.c|  34 ++--
>  hw/block/nvme.c   | 193 +-
>  hw/block/trace-events |   4 +
>  6 files changed, 240 insertions(+), 9 deletions(-)
> 
> -- 
> 2.29.1
> 

Keith, I cleared your R-b's from both patch 1 and 3 - please re-review.
The diff from v6 is very small, but it does include functional changes.


signature.asc
Description: PGP signature


[PATCH v2] block: Remove unused BlockDeviceMapEntry

2020-10-30 Thread Markus Armbruster
BlockDeviceMapEntry has never been used.  It was added in commit
facd6e2 "so that it is published through the introspection mechanism."
What exactly introspecting types that aren't used for anything could
accomplish isn't clear.  What "introspection mechanism" to use is also
nebulous.  To the best of my knowledge, there has never been one that
covered this type.  Certainly not query-qmp-schema, which includes
only types that are actually used in QMP.

Not being able to introspect BlockDeviceMapEntry hasn't bothered
anyone enough to complain in almost four years.  Get rid of it.

Cc: Paolo Bonzini 
Cc: Eric Blake 
Reviewed-by: Eric Blake 
Signed-off-by: Markus Armbruster 
---
I found an old patch I neglected to merge.

Max replied to a remark in Eric's review of v1:

Max Reitz  writes:

> On 2017-07-28 20:10, Eric Blake wrote:
>> This type is the schema for 'qemu-img map --output=json'.  And I had a
>> patch once (that I need to revive) that added a JSON Output visitor; at
>> which point I fixed qemu-img to convert from QAPI to JSON instead of
>> open-coding its construction of its output string, at which point the
>> QAPI generated code for this type is useful.
> (Very late reply, I know, I just stumbled over *MapEntry when looking
> over block-core.json what we might want to deprecate in 3.0)
>
> We already use MapEntry there -- why don't we output just that instead?
> The only difference seems to be an additional @filename parameter which
> would probably be actually nice to include in the output.
>
> Except that BlockDeviceMapEntry's documentation is better, so we should
> merge that into MapEntry before removing the former.
>
> Max

https://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg02933.html

Me doing the doc update Max suggested could take more than one
iteration, as I know nothing about this stuff.  Max, could you give it
a try?  Feel free to take over my patch.

 qapi/block-core.json | 29 -
 1 file changed, 29 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index e00fc27b5e..2aa499a72e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -418,35 +418,6 @@
 ##
 { 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'failed', 'nospace' ] }
 
-##
-# @BlockDeviceMapEntry:
-#
-# Entry in the metadata map of the device (returned by "qemu-img map")
-#
-# @start: Offset in the image of the first byte described by this entry
-# (in bytes)
-#
-# @length: Length of the range described by this entry (in bytes)
-#
-# @depth: Number of layers (0 = top image, 1 = top image's backing file, etc.)
-# before reaching one for which the range is allocated.  The value is
-# in the range 0 to the depth of the image chain - 1.
-#
-# @zero: the sectors in this range read as zeros
-#
-# @data: reading the image will actually read data from a file (in particular,
-#if @offset is present this means that the sectors are not simply
-#preallocated, but contain actual data in raw format)
-#
-# @offset: if present, the image file stores the data for this range in
-#  raw format at the given offset.
-#
-# Since: 1.7
-##
-{ 'struct': 'BlockDeviceMapEntry',
-  'data': { 'start': 'int', 'length': 'int', 'depth': 'int', 'zero': 'bool',
-'data': 'bool', '*offset': 'int' } }
-
 ##
 # @DirtyBitmapStatus:
 #
-- 
2.26.2