Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-03-29 Thread Daniel P . Berrangé
On Fri, Mar 29, 2024 at 11:28:54AM +0100, Philippe Mathieu-Daudé wrote:
> Hi Zhijian,
> 
> On 29/3/24 02:53, Zhijian Li (Fujitsu) wrote:
> > 
> > 
> > On 28/03/2024 23:01, Peter Xu wrote:
> > > On Thu, Mar 28, 2024 at 11:18:04AM -0300, Fabiano Rosas wrote:
> > > > Philippe Mathieu-Daudé  writes:
> > > > 
> > > > > The whole RDMA subsystem was deprecated in commit e9a54265f5
> > > > > ("hw/rdma: Deprecate the pvrdma device and the rdma subsystem")
> > > > > released in v8.2.
> > > > > 
> > > > > Remove:
> > > > >- RDMA handling from migration
> > > > >- dependencies on libibumad, libibverbs and librdmacm
> > > > > 
> > > > > Keep the RAM_SAVE_FLAG_HOOK definition since it might appears
> > > > > in old migration streams.
> > > > > 
> > > > > Cc: Peter Xu 
> > > > > Cc: Li Zhijian 
> > > > > Acked-by: Fabiano Rosas 
> > > > > Signed-off-by: Philippe Mathieu-Daudé 
> > > > 
> > > > Just to be clear, because people raised the point in the last version,
> > > > the first link in the deprecation commit links to a thread comprising
> > > > entirely of rdma migration patches. I don't see any ambiguity on whether
> > > > the deprecation was intended to include migration. There's even an ack
> > > > from Juan.
> > > 
> > > Yes I remember that's the plan.
> > > 
> > > > 
> > > > So on the basis of not reverting the previous maintainer's decision, my
> > > > Ack stands here.
> > > > 
> > > > We also had pretty obvious bugs ([1], [2]) in the past that would have
> > > > been caught if we had any kind of testing for the feature, so I can't
> > > > even say this thing works currently.
> > > > 
> > > > @Peter Xu, @Li Zhijian, what are your thoughts on this?
> > > 
> > > Generally I definitely agree with such a removal sooner or later, as 
> > > that's
> > > how deprecation works, and even after Juan's left I'm not aware of any
> > > other new RDMA users.  Personally, I'd slightly prefer postponing it one
> > > more release which might help a bit of our downstream maintenance, however
> > > I assume that's not a blocker either, as I think we can also manage it.
> > > 
> > > IMHO it's more important to know whether there are still users and whether
> > > they would still like to see it around. That's also one thing I notice 
> > > that
> > > e9a54265f533f didn't yet get acks from RDMA users that we are aware, even
> > > if they're rare. According to [2] it could be that such user may only rely
> > > on the release versions of QEMU when it broke things.
> > > 
> > > So I'm copying Yu too (while Zhijian is already in the loop), just in case
> > > someone would like to stand up and speak.
> > 
> > 
> > I admit RDMA migration was lack of testing(unit/CI test), which led to the 
> > a few
> > obvious bugs being noticed too late.
> > However I was a bit surprised when I saw the removal of the RDMA migration. 
> > I wasn't
> > aware that this feature has not been marked as deprecated(at least there is 
> > no
> > prompt to end-user).
> > 
> > 
> > > IMHO it's more important to know whether there are still users and whether
> > > they would still like to see it around.
> > 
> > Agree.
> > I didn't immediately express my opinion in V1 because I'm also consulting 
> > our
> > customers for this feature in the future.
> > 
> > Personally, I agree with Perter's idea that "I'd slightly prefer postponing 
> > it one
> > more release which might help a bit of our downstream maintenance"
> 
> Do you mind posting a deprecation patch to clarify the situation?

The key thing the first deprecation patch missed was that it failed
to issue a warning message when RDMA migration was actually used.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 29.03.24 13:56, Cédric Le Goater wrote:

This allows to report more precise errors in the migration handler
dirty_bitmap_save_setup().

Suggested-by Vladimir Sementsov-Ogievskiy  
Signed-off-by: Cédric Le Goater


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




[PATCH v3 2/5] qdev-monitor: fix error message in find_device_state()

2024-03-29 Thread Vladimir Sementsov-Ogievskiy
This "hotpluggable" here is misleading. Actually we check is object a
device or not. Let's drop the word.

Suggested-by: Markus Armbruster 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Markus Armbruster 
---
 system/qdev-monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index c1243891c3..840177d19f 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -891,7 +891,7 @@ static DeviceState *find_device_state(const char *id, Error 
**errp)
 
 dev = (DeviceState *)object_dynamic_cast(obj, TYPE_DEVICE);
 if (!dev) {
-error_setg(errp, "%s is not a hotpluggable device", id);
+error_setg(errp, "%s is not a device", id);
 return NULL;
 }
 
-- 
2.34.1




[PATCH v3 0/5] vhost-user-blk: live resize additional APIs

2024-03-29 Thread Vladimir Sementsov-Ogievskiy
v3:
02: add r-b by Markus
03: improve commit message
04: improve documentation, merge race-fix here (which was v2:05),
rebase on master (migration_is_running() now without arguments)
05: improve documentation

Vladimir Sementsov-Ogievskiy (5):
  vhost-user-blk: simplify and fix vhost_user_blk_handle_config_change
  qdev-monitor: fix error message in find_device_state()
  qdev-monitor: add option to report GenericError from find_device_state
  qapi: introduce device-sync-config
  qapi: introduce CONFIG_READ event

 hw/block/vhost-user-blk.c | 32 +++---
 hw/virtio/virtio-pci.c| 18 ++
 include/hw/qdev-core.h|  3 ++
 include/monitor/qdev.h|  2 ++
 include/sysemu/runstate.h |  1 +
 monitor/monitor.c |  1 +
 qapi/qdev.json| 54 ++
 stubs/qdev.c  |  6 
 system/qdev-monitor.c | 70 ---
 system/runstate.c |  5 +++
 10 files changed, 175 insertions(+), 17 deletions(-)

-- 
2.34.1




[PATCH v3 3/5] qdev-monitor: add option to report GenericError from find_device_state

2024-03-29 Thread Vladimir Sementsov-Ogievskiy
Here we just prepare for the following patch, making possible to report
GenericError as recommended.

This patch doesn't aim to prevent further use of DeviceNotFound by
future interfaces:

 - find_device_state() is used in blk_by_qdev_id() and qmp_get_blk()
   functions, which may lead to spread of DeviceNotFound anyway
 - also, nothing prevent simply copy-pasting find_device_state() calls
   with false argument

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/qdev-monitor.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 840177d19f..7e075d91c1 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -878,13 +878,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 object_unref(OBJECT(dev));
 }
 
-static DeviceState *find_device_state(const char *id, Error **errp)
+/*
+ * Note that creating new APIs using error classes other than GenericError is
+ * not recommended. Set use_generic_error=true for new interfaces.
+ */
+static DeviceState *find_device_state(const char *id, bool use_generic_error,
+  Error **errp)
 {
 Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
 DeviceState *dev;
 
 if (!obj) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+error_set(errp,
+  (use_generic_error ?
+   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
   "Device '%s' not found", id);
 return NULL;
 }
@@ -948,7 +955,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 
 void qmp_device_del(const char *id, Error **errp)
 {
-DeviceState *dev = find_device_state(id, errp);
+DeviceState *dev = find_device_state(id, false, errp);
 if (dev != NULL) {
 if (dev->pending_deleted_event &&
 (dev->pending_deleted_expires_ms == 0 ||
@@ -1068,7 +1075,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
 
 GLOBAL_STATE_CODE();
 
-dev = find_device_state(id, errp);
+dev = find_device_state(id, false, errp);
 if (dev == NULL) {
 return NULL;
 }
-- 
2.34.1




[PATCH v3 5/5] qapi: introduce CONFIG_READ event

2024-03-29 Thread Vladimir Sementsov-Ogievskiy
Send a new event when guest reads virtio-pci config after
virtio_notify_config() call.

That's useful to check that guest fetched modified config, for example
after resizing disk backend.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/virtio/virtio-pci.c |  9 +
 include/monitor/qdev.h |  2 ++
 monitor/monitor.c  |  1 +
 qapi/qdev.json | 33 +
 stubs/qdev.c   |  6 ++
 system/qdev-monitor.c  |  6 ++
 6 files changed, 57 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 92afbae71c..c0c158dae2 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -23,6 +23,7 @@
 #include "hw/boards.h"
 #include "hw/virtio/virtio.h"
 #include "migration/qemu-file-types.h"
+#include "monitor/qdev.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -530,6 +531,10 @@ static uint64_t virtio_pci_config_read(void *opaque, 
hwaddr addr,
 }
 addr -= config;
 
+if (vdev->generation > 0) {
+qdev_virtio_config_read_event(DEVICE(proxy));
+}
+
 switch (size) {
 case 1:
 val = virtio_config_readb(vdev, addr);
@@ -1884,6 +1889,10 @@ static uint64_t virtio_pci_device_read(void *opaque, 
hwaddr addr,
 return UINT64_MAX;
 }
 
+if (vdev->generation > 0) {
+qdev_virtio_config_read_event(DEVICE(proxy));
+}
+
 switch (size) {
 case 1:
 val = virtio_config_modern_readb(vdev, addr);
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 1d57bf6577..fc9a834dca 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -36,4 +36,6 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
  */
 const char *qdev_set_id(DeviceState *dev, char *id, Error **errp);
 
+void qdev_virtio_config_read_event(DeviceState *dev);
+
 #endif
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 01ede1babd..5b06146503 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -316,6 +316,7 @@ static MonitorQAPIEventConf 
monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
 [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
 [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
 [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
+[QAPI_EVENT_VIRTIO_CONFIG_READ] = { 300 * SCALE_MS },
 };
 
 /*
diff --git a/qapi/qdev.json b/qapi/qdev.json
index e8be79c3d5..29a4f47360 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -182,3 +182,36 @@
 { 'command': 'device-sync-config',
   'features': [ 'unstable' ],
   'data': {'id': 'str'} }
+
+##
+# @VIRTIO_CONFIG_READ:
+#
+# Emitted whenever guest reads virtio device configuration after
+# configuration change.
+#
+# The event may be used in pair with device-sync-config. It shows
+# that guest has re-read updated configuration. It doesn't
+# guarantee that guest successfully handled it and updated the
+# view of the device for the user, but still it's a kind of
+# success indicator.
+#
+# @device: device name
+#
+# @path: device path
+#
+# Features:
+#
+# @unstable: The event is experimental.
+#
+# Since: 9.1
+#
+# Example:
+#
+# <- { "event": "VIRTIO_CONFIG_READ",
+#  "data": { "device": "virtio-net-pci-0",
+#"path": "/machine/peripheral/virtio-net-pci-0" },
+#  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+##
+{ 'event': 'VIRTIO_CONFIG_READ',
+  'features': [ 'unstable' ],
+  'data': { '*device': 'str', 'path': 'str' } }
diff --git a/stubs/qdev.c b/stubs/qdev.c
index 6869f6f90a..ab6c4afe0b 100644
--- a/stubs/qdev.c
+++ b/stubs/qdev.c
@@ -26,3 +26,9 @@ void qapi_event_send_device_unplug_guest_error(const char 
*device,
 {
 /* Nothing to do. */
 }
+
+void qapi_event_send_virtio_config_read(const char *device,
+const char *path)
+{
+/* Nothing to do. */
+}
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index cb35ea0b86..8a2ca77fde 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -26,6 +26,7 @@
 #include "sysemu/runstate.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-qdev.h"
+#include "qapi/qapi-events-qdev.h"
 #include "qapi/qmp/dispatch.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
@@ -1206,3 +1207,8 @@ bool qmp_command_available(const QmpCommand *cmd, Error 
**errp)
 }
 return true;
 }
+
+void qdev_virtio_config_read_event(DeviceState *dev)
+{
+qapi_event_send_virtio_config_read(dev->id, dev->canonical_path);
+}
-- 
2.34.1




[PATCH v3 4/5] qapi: introduce device-sync-config

2024-03-29 Thread Vladimir Sementsov-Ogievskiy
Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/block/vhost-user-blk.c | 27 --
 hw/virtio/virtio-pci.c|  9 
 include/hw/qdev-core.h|  3 +++
 include/sysemu/runstate.h |  1 +
 qapi/qdev.json| 21 +
 system/qdev-monitor.c | 47 +++
 system/runstate.c |  5 +
 7 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 s->blkcfg.wce = blkcfg->wce;
 }
 
+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)
+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, >blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
 static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
 {
 int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
 
 if (!dev->started) {
 return 0;
 }
 
-ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg,
-   vdev->config_len, _err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
 }
 
-memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-
 return 0;
 }
 
@@ -576,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
 
 device_class_set_props(dc, vhost_user_blk_properties);
 dc->vmsd = _vhost_user_blk;
+dc->sync_config = vhost_user_blk_sync_config;
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 vdc->realize = vhost_user_blk_device_realize;
 vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eaaf86402c..92afbae71c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2501,6 +2501,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
 vpciklass->parent_dc_realize(qdev, errp);
 }
 
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+
+return qdev_sync_config(DEVICE(vdev), errp);
+}
+
 static void virtio_pci_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2517,6 +2525,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
 device_class_set_parent_realize(dc, virtio_pci_dc_realize,
 >parent_dc_realize);
 rc->phases.hold = virtio_pci_bus_reset_hold;
+dc->sync_config = virtio_pci_sync_config;
 }
 
 static const TypeInfo virtio_pci_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87..87135bdcdf 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
 typedef void (*DeviceReset)(DeviceState *dev);
 typedef void (*BusRealize)(BusState *bus, Error **errp);
 typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
 
 /**
  * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
 DeviceReset reset;
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+DeviceSyncConfig sync_config;
 
 /**
  * @vmsd: device state serialisation description for
@@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
  */
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
 void qdev_machine_creation_done(void);
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..296af52322 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -5,6 +5,7 @@
 #include "qemu/notify.h"
 
 bool runstate_check(RunState state);
+const char 

[PATCH v3 1/5] vhost-user-blk: simplify and fix vhost_user_blk_handle_config_change

2024-03-29 Thread Vladimir Sementsov-Ogievskiy
Let's not care about what was changed and update the whole config,
reasons:

1. config->geometry should be updated together with capacity, so we fix
   a bug.

2. Vhost-user protocol doesn't say anything about config change
   limitation. Silent ignore of changes doesn't seem to be correct.

3. vhost-user-vsock reads the whole config

4. on realize we don't do any checks on retrieved config, so no reason
   to care here

Comment "valid for resize only" exists since introduction the whole
hw/block/vhost-user-blk.c in commit
   00343e4b54ba0685e9ebe928ec5713b0cf7f1d1c
"vhost-user-blk: introduce a new vhost-user-blk host device",
seems it was just an extra limitation.

Also, let's notify guest unconditionally:

1. So does vhost-user-vsock

2. We are going to reuse the functionality in new cases when we do want
   to notify the guest unconditionally. So, no reason to create extra
   branches in the logic.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..9e6bbc6950 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -91,7 +91,6 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
 {
 int ret;
-struct virtio_blk_config blkcfg;
 VirtIODevice *vdev = dev->vdev;
 VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
@@ -100,19 +99,15 @@ static int vhost_user_blk_handle_config_change(struct 
vhost_dev *dev)
 return 0;
 }
 
-ret = vhost_dev_get_config(dev, (uint8_t *),
+ret = vhost_dev_get_config(dev, (uint8_t *)>blkcfg,
vdev->config_len, _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
 }
 
-/* valid for resize only */
-if (blkcfg.capacity != s->blkcfg.capacity) {
-s->blkcfg.capacity = blkcfg.capacity;
-memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-}
+memcpy(dev->vdev->config, >blkcfg, vdev->config_len);
+virtio_notify_config(dev->vdev);
 
 return 0;
 }
-- 
2.34.1




Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup

2024-03-29 Thread Zhu Yangyang via
On Thu, 28 Mar 2024 07:40:14 -0500, Eric Blake wrote:
> On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang via wrote:
> > If g_main_loop_run()/aio_poll() is called in the coroutine context,
> > the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> > may be disordered.
> > 
> > When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> > some listened events is completed. Therefore, the completion callback
> > function is dispatched.
> > 
> > If this callback function needs to invoke aio_co_enter(), it will only
> > wake up the coroutine (because we are already in coroutine context),
> > which may cause that the data on this listening event_fd/socket_fd
> > is not read/cleared. When the next poll () exits, it will be woken up again
> > and inserted into the wakeup queue again.
> > 
> > For example, if TLS is enabled in NBD, the server will call 
> > g_main_loop_run()
> > in the coroutine, and repeatedly wake up the io_read event on a socket.
> > The call stack is as follows:
> > 
> > aio_co_enter()
> > aio_co_wake()
> > qio_channel_restart_read()
> > aio_dispatch_handler()
> > aio_dispatch_handlers()
> > aio_dispatch()
> > aio_ctx_dispatch()
> > g_main_context_dispatch()
> > g_main_loop_run()
> > nbd_negotiate_handle_starttls()
> > nbd_negotiate_options()
> > nbd_negotiate()
> > nbd_co_client_start()
> > coroutine_trampoline()
> 
> zhuyangyang, do you have a reliable reproduction setup for how you
> were able to trigger this?  Obviously, it only happens when TLS is
> enabled (we aren't creating a g_main_loop_run for any other NBD
> command), and only when the server is first starting to serve a
> client; is this a case where you were hammering a long-running qemu
> process running an NBD server with multiple clients trying to
> reconnect to the server all near the same time?

I'm sorry I didn't make the background of the problem clear before,
this problem is not on the NBD command, but on the VM live migration
with qemu TLS.

Next, I'll detail how to reproduce the issue.

1. Make the problem more obvious.

When TLS is enabled during live migration, the migration progress
may be suspended because some I/O are not returned during the mirror job
on target host.

Now we know that the reason is that some coroutines are lost.
The entry function of these lost coroutines are nbd_trip().

Add an assertion on the target host side to make the problem
show up quickly.

$ git diff util/async.c
diff --git a/util/async.c b/util/async.c
index 0467890052..4e3547c3ea 100644
--- a/util/async.c
+++ b/util/async.c
@@ -705,6 +705,7 @@ void aio_co_enter(AioContext *ctx, Coroutine *co)
 if (qemu_in_coroutine()) {
 Coroutine *self = qemu_coroutine_self();
 assert(self != co);
+assert(!co->co_queue_next.sqe_next);
 QSIMPLEQ_INSERT_TAIL(>co_queue_wakeup, co, co_queue_next);
 } else {
 qemu_aio_coroutine_enter(ctx, co);

2. Reproduce the issue

1) start vm on the origin host

Note: Configuring multiple disks for a VM 
(It is recommended to configure more than 6 disks)

These disk tasks(nbd_trip) will be performed simultaneously 
with nbd_negotiate_handle_starttls() on the main thread during migrate.


  centos7.3_64_server
  4194304
  4194304
  4
  
/machine
  
  
hvm

  
  



  
  
  



  
  destroy
  restart
  restart
  
/usr/bin/qemu-kvm

  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  
  
  
  
  
  


  

  


$ virsh create vm_x86.xml
Domain 'centos7.3_64_server' created from /home/vm_x86.xml

2) migrate the vm to target host
virsh migrate --live --p2p \
   --migrateuri tcp:10.91.xxx.xxx centos7.3_64_server 
qemu+tcp://10.91.xxx.xxx/system \
   --copy-storage-all \
   --tls

Than, An error is reported on the peer host
qemu-kvm: ../util/async.c:705: aio_co_enter: Assertion 
`!co->co_queue_next.sqe_next' failed.

> 
> If we can come up with a reliable formula for reproducing the
> corrupted coroutine list, it would make a great iotest addition
> alongside the existing qemu-iotests 233 for ensuring that NBD TLS
> traffic is handled correctly in both server and client.

I'm not sure if this can be used for testing of qemu-nbd

> 
> > 
> > Signed-off-by: zhuyangyang 
> 
> Side note: this appears to be your first qemu contribution (based on
> 'git shortlog --author zhuyangyang').  While I am not in a position to
> presume how you would like your name Anglicized, I will point out that
> the prevailing style is to separate given name from family name (just
> because your username at work has no spaces does not mean that your
> S-o-b has to follow suit).  It is also permissible to list your name
> in native characters alongside or in place of the 

Re: [PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()

2024-03-29 Thread Philippe Mathieu-Daudé

On 29/3/24 11:56, Cédric Le Goater wrote:

This allows to report more precise errors in the migration handler
dirty_bitmap_save_setup().

Suggested-by Vladimir Sementsov-Ogievskiy 
Signed-off-by: Cédric Le Goater 
---

  To apply on top of :
  https://lore.kernel.org/qemu-devel/20240320064911.545001-1-...@redhat.com/
  
  migration/block-dirty-bitmap.c | 34 --

  1 file changed, 16 insertions(+), 18 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 29.03.24 13:53, Cédric Le Goater wrote:

Hello Vladimir,

On 3/29/24 10:32, Vladimir Sementsov-Ogievskiy wrote:

On 20.03.24 09:49, Cédric Le Goater wrote:

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,12 +1213,14 @@ fail:
  return ret;
  }
-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
  if (init_dirty_bitmap_migration(s) < 0) {
+    error_setg(errp,
+   "Failed to initialize dirty tracking bitmap for blocks");


No, that's not about initializing a bitmap. This all is about migration of 
block-dirty-bitmaps themselves.

So correct would be say "Failed to initialize migration of block dirty bitmaps".

with this, for block dirty bitmap migration:
Acked-by: Vladimir Sementsov-Ogievskiy 


I had kept your previous R-b.


Oh, I missed that)



Should we remove it ? or is it ok if I address your comments below in a
followup patch, in which case the error message above would be removed.
 
Yes of course, you can keep my old r-b. Followup patch is appreciated





Still, a lot better is add errp to init_dirty_bitmap_migration() and to 
add_bitmaps_to_list() too: look,

init_dirty_bitmap_migration() fails only if add_bitmaps_to_list() fails

in turn,

add_bitmaps_to_list() have several clear failure points, where it always does 
error_report (or error_report_err), which would be better to pass-through to 
the user.


Good idea. Will do.

Thanks,

C.




--
Best regards,
Vladimir




[PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()

2024-03-29 Thread Cédric Le Goater
This allows to report more precise errors in the migration handler
dirty_bitmap_save_setup().

Suggested-by Vladimir Sementsov-Ogievskiy  
Signed-off-by: Cédric Le Goater 
---

 To apply on top of : 
 https://lore.kernel.org/qemu-devel/20240320064911.545001-1-...@redhat.com/
 
 migration/block-dirty-bitmap.c | 34 --
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
542a8c297b329abc30d1b3a205d29340fa59a961..a7d55048c23505fde565ca784cec3c917dca37e5
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -481,13 +481,13 @@ static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
 
 /* Called with the BQL taken. */
 static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
-   const char *bs_name, GHashTable *alias_map)
+   const char *bs_name, GHashTable *alias_map,
+   Error **errp)
 {
 BdrvDirtyBitmap *bitmap;
 SaveBitmapState *dbms;
 GHashTable *bitmap_aliases;
 const char *node_alias, *bitmap_name, *bitmap_alias;
-Error *local_err = NULL;
 
 /* When an alias map is given, @bs_name must be @bs's node name */
 assert(!alias_map || !strcmp(bs_name, bdrv_get_node_name(bs)));
@@ -504,8 +504,8 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 bitmap_name = bdrv_dirty_bitmap_name(bitmap);
 
 if (!bs_name || strcmp(bs_name, "") == 0) {
-error_report("Bitmap '%s' in unnamed node can't be migrated",
- bitmap_name);
+error_setg(errp, "Bitmap '%s' in unnamed node can't be migrated",
+   bitmap_name);
 return -1;
 }
 
@@ -525,9 +525,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }
 
 if (node_alias[0] == '#') {
-error_report("Bitmap '%s' in a node with auto-generated "
- "name '%s' can't be migrated",
- bitmap_name, node_alias);
+error_setg(errp, "Bitmap '%s' in a node with auto-generated "
+   "name '%s' can't be migrated",
+   bitmap_name, node_alias);
 return -1;
 }
 
@@ -538,8 +538,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 continue;
 }
 
-if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, _err)) {
-error_report_err(local_err);
+if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
 return -1;
 }
 
@@ -558,9 +557,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
-error_report("Cannot migrate bitmap '%s' on node '%s': "
- "Name is longer than %u bytes",
- bitmap_name, bs_name, UINT8_MAX);
+error_setg(errp, "Cannot migrate bitmap '%s' on node '%s': "
+   "Name is longer than %u bytes",
+   bitmap_name, bs_name, UINT8_MAX);
 return -1;
 }
 bitmap_alias = bitmap_name;
@@ -599,7 +598,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }
 
 /* Called with the BQL taken. */
-static int init_dirty_bitmap_migration(DBMSaveState *s)
+static int init_dirty_bitmap_migration(DBMSaveState *s, Error **errp)
 {
 BlockDriverState *bs;
 SaveBitmapState *dbms;
@@ -643,7 +642,7 @@ static int init_dirty_bitmap_migration(DBMSaveState *s)
 }
 
 if (bs && bs->drv && !bs->drv->is_filter) {
-if (add_bitmaps_to_list(s, bs, name, NULL)) {
+if (add_bitmaps_to_list(s, bs, name, NULL, errp)) {
 goto fail;
 }
 g_hash_table_add(handled_by_blk, bs);
@@ -656,7 +655,8 @@ static int init_dirty_bitmap_migration(DBMSaveState *s)
 continue;
 }
 
-if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs), alias_map)) {
+if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs), alias_map,
+errp)) {
 goto fail;
 }
 }
@@ -1218,9 +1218,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void 
*opaque, Error **errp)
 DBMSaveState *s = &((DBMState *)opaque)->save;
 SaveBitmapState *dbms = NULL;
 
-if (init_dirty_bitmap_migration(s) < 0) {
-error_setg(errp,
-   "Failed to initialize dirty tracking bitmap for blocks");
+if (init_dirty_bitmap_migration(s, errp) < 0) {
 return -1;
 }
 
-- 
2.44.0




Re: [PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler

2024-03-29 Thread Cédric Le Goater

Hello Vladimir,

On 3/29/24 10:32, Vladimir Sementsov-Ogievskiy wrote:

On 20.03.24 09:49, Cédric Le Goater wrote:

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,12 +1213,14 @@ fail:
  return ret;
  }
-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
  if (init_dirty_bitmap_migration(s) < 0) {
+    error_setg(errp,
+   "Failed to initialize dirty tracking bitmap for blocks");


No, that's not about initializing a bitmap. This all is about migration of 
block-dirty-bitmaps themselves.

So correct would be say "Failed to initialize migration of block dirty bitmaps".

with this, for block dirty bitmap migration:
Acked-by: Vladimir Sementsov-Ogievskiy 


I had kept your previous R-b.

Should we remove it ? or is it ok if I address your comments below in a
followup patch, in which case the error message above would be removed.


Still, a lot better is add errp to init_dirty_bitmap_migration() and to 
add_bitmaps_to_list() too: look,

init_dirty_bitmap_migration() fails only if add_bitmaps_to_list() fails

in turn,

add_bitmaps_to_list() have several clear failure points, where it always does 
error_report (or error_report_err), which would be better to pass-through to 
the user.


Good idea. Will do.

Thanks,

C.





Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-03-29 Thread Philippe Mathieu-Daudé

Hi Zhijian,

On 29/3/24 02:53, Zhijian Li (Fujitsu) wrote:



On 28/03/2024 23:01, Peter Xu wrote:

On Thu, Mar 28, 2024 at 11:18:04AM -0300, Fabiano Rosas wrote:

Philippe Mathieu-Daudé  writes:


The whole RDMA subsystem was deprecated in commit e9a54265f5
("hw/rdma: Deprecate the pvrdma device and the rdma subsystem")
released in v8.2.

Remove:
   - RDMA handling from migration
   - dependencies on libibumad, libibverbs and librdmacm

Keep the RAM_SAVE_FLAG_HOOK definition since it might appears
in old migration streams.

Cc: Peter Xu 
Cc: Li Zhijian 
Acked-by: Fabiano Rosas 
Signed-off-by: Philippe Mathieu-Daudé 


Just to be clear, because people raised the point in the last version,
the first link in the deprecation commit links to a thread comprising
entirely of rdma migration patches. I don't see any ambiguity on whether
the deprecation was intended to include migration. There's even an ack
from Juan.


Yes I remember that's the plan.



So on the basis of not reverting the previous maintainer's decision, my
Ack stands here.

We also had pretty obvious bugs ([1], [2]) in the past that would have
been caught if we had any kind of testing for the feature, so I can't
even say this thing works currently.

@Peter Xu, @Li Zhijian, what are your thoughts on this?


Generally I definitely agree with such a removal sooner or later, as that's
how deprecation works, and even after Juan's left I'm not aware of any
other new RDMA users.  Personally, I'd slightly prefer postponing it one
more release which might help a bit of our downstream maintenance, however
I assume that's not a blocker either, as I think we can also manage it.

IMHO it's more important to know whether there are still users and whether
they would still like to see it around. That's also one thing I notice that
e9a54265f533f didn't yet get acks from RDMA users that we are aware, even
if they're rare. According to [2] it could be that such user may only rely
on the release versions of QEMU when it broke things.

So I'm copying Yu too (while Zhijian is already in the loop), just in case
someone would like to stand up and speak.



I admit RDMA migration was lack of testing(unit/CI test), which led to the a few
obvious bugs being noticed too late.
However I was a bit surprised when I saw the removal of the RDMA migration. I 
wasn't
aware that this feature has not been marked as deprecated(at least there is no
prompt to end-user).



IMHO it's more important to know whether there are still users and whether
they would still like to see it around.


Agree.
I didn't immediately express my opinion in V1 because I'm also consulting our
customers for this feature in the future.

Personally, I agree with Perter's idea that "I'd slightly prefer postponing it 
one
more release which might help a bit of our downstream maintenance"


Do you mind posting a deprecation patch to clarify the situation?

Thanks,

Phil.



Thanks
Zhijian



Thanks,



1- https://lore.kernel.org/r/20230920090412.726725-1-lizhij...@fujitsu.com
2- 
https://lore.kernel.org/r/cahecvy7hxswn4ow_kog+q+tn6f_kmeichevz1qgm-fbxbpp...@mail.gmail.com






Re: [PATCH 2/2] backup: add minimum cluster size to performance options

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 08.03.24 18:51, Fiona Ebner wrote:

Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.

Backup/block-copy will use at least this granularity for copy operations
and in particular, discard requests to the backup source will too. If
the granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---
  block/backup.c| 2 +-
  block/copy-before-write.c | 2 ++
  block/copy-before-write.h | 1 +
  blockdev.c| 3 +++
  qapi/block-core.json  | 9 +++--
  5 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 3dd2e229d2..a1292c01ec 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -458,7 +458,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
  }
  
  cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,

-  , errp);
+  perf->min_cluster_size, , errp);
  if (!cbw) {
  goto error;
  }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index f9896c6c1e..55a9272485 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -545,6 +545,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+  int64_t min_cluster_size,


same note, suggest uint32_t


BlockCopyState **bcs,
Error **errp)
  {
@@ -563,6 +564,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
  }
  qdict_put_str(opts, "file", bdrv_get_node_name(source));
  qdict_put_str(opts, "target", bdrv_get_node_name(target));
+qdict_put_int(opts, "min-cluster-size", min_cluster_size);
  
  top = bdrv_insert_node(source, opts, flags, errp);

  if (!top) {
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..dc6cafe7fa 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+  int64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp);
  void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index daceb50460..8e6bdbc94a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2653,6 +2653,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
  if (backup->x_perf->has_max_chunk) {
  perf.max_chunk = backup->x_perf->max_chunk;
  }
+if (backup->x_perf->has_min_cluster_size) {
+perf.min_cluster_size = backup->x_perf->min_cluster_size;
+}
  }
  
  if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 85c8f88f6e..ba0836892f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1551,11 +1551,16 @@
  # it should not be less than job cluster size which is calculated
  # as maximum of target image cluster size and 64k.  Default 0.
  #
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# and background copy operations.  Has to be a power of 2.  No
+# effect if smaller than the maximum of the target's cluster size
+# and 64 KiB.  Default 0.  (Since 9.0)


9.1


+#
  # Since: 6.0
  ##
  { 'struct': 'BackupPerf',
-  'data': { '*use-copy-range': 'bool',
-'*max-workers': 'int', '*max-chunk': 'int64' } }
+  'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+'*max-chunk': 'int64', '*min-cluster-size': 'uint32' } }
  
  ##

  # @BackupCommon:




--
Best regards,
Vladimir




Re: [PATCH 1/2] copy-before-write: allow specifying minimum cluster size

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 08.03.24 18:51, Fiona Ebner wrote:

Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.

Copy-before-write operations will use at least this granularity and in
particular, discard requests to the source node will too. If the
granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.

The QAPI uses uint32 so the value will be non-negative, but still fit
into a uint64_t.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---
  block/block-copy.c | 17 +
  block/copy-before-write.c  |  3 ++-
  include/block/block-copy.h |  1 +
  qapi/block-core.json   |  8 +++-
  4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 7e3b378528..adb1cbb440 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
  }
  
  static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,

+ int64_t min_cluster_size,


Maybe better use uint32_t here as well.


   Error **errp)
  {
  int ret;
@@ -335,7 +336,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  "used. If the actual block size of the target exceeds "
  "this default, the backup may be unusable",
  BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
  } else if (ret < 0 && !target_does_cow) {
  error_setg_errno(errp, -ret,
  "Couldn't determine the cluster size of the target image, "
@@ -345,16 +346,18 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  return ret;
  } else if (ret < 0 && target_does_cow) {
  /* Not fatal; just trudge on ahead. */
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
  }
  
-return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);

+return MAX(min_cluster_size,
+   MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size));
  }
  
  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,

   BlockDriverState *copy_bitmap_bs,
   const BdrvDirtyBitmap *bitmap,
   bool discard_source,
+ int64_t min_cluster_size,


and here why not uint32_t


   Error **errp)
  {
  ERRP_GUARD();
@@ -365,7 +368,13 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  
  GLOBAL_STATE_CODE();
  
-cluster_size = block_copy_calculate_cluster_size(target->bs, errp);

+if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+error_setg(errp, "min-cluster-size needs to be a power of 2");
+return NULL;
+}
+
+cluster_size = block_copy_calculate_cluster_size(target->bs,
+ min_cluster_size, errp);
  if (cluster_size < 0) {
  return NULL;
  }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index dac57481c5..f9896c6c1e 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -476,7 +476,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
  
  s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;

  s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
-  flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
+  flags & BDRV_O_CBW_DISCARD_SOURCE,
+  opts->min_cluster_size, errp);


I assume it is guaranteed to be 0 when not specified by user.


  if (!s->bcs) {
  error_prepend(errp, "Cannot create block-copy-state: ");
  return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index bdc703bacd..77857c6c68 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
   BlockDriverState *copy_bitmap_bs,
   const BdrvDirtyBitmap *bitmap,
   bool discard_source,
+ int64_t min_cluster_size,
   Error **errp);
  
  /* Function should be called prior any actual copy request */

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 

Re: [PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 20.03.24 09:49, Cédric Le Goater wrote:

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,12 +1213,14 @@ fail:
  return ret;
  }
  
-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)

+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
  
  if (init_dirty_bitmap_migration(s) < 0) {

+error_setg(errp,
+   "Failed to initialize dirty tracking bitmap for blocks");


No, that's not about initializing a bitmap. This all is about migration of 
block-dirty-bitmaps themselves.

So correct would be say "Failed to initialize migration of block dirty bitmaps".

with this, for block dirty bitmap migration:
Acked-by: Vladimir Sementsov-Ogievskiy 


Still, a lot better is add errp to init_dirty_bitmap_migration() and to 
add_bitmaps_to_list() too: look,

init_dirty_bitmap_migration() fails only if add_bitmaps_to_list() fails

in turn,

add_bitmaps_to_list() have several clear failure points, where it always does 
error_report (or error_report_err), which would be better to pass-through to 
the user.


  return -1;
  }
  
diff --git a/migrat


--
Best regards,
Vladimir




Re: [PATCH v4] blockcommit: Reopen base image as RO after abort

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 28.03.24 12:16, Alexander Ivanov wrote:

If a blockcommit is aborted the base image remains in RW mode, that leads
to a fail of subsequent live migration.

How to reproduce:
   $ virsh snapshot-create-as vm snp1 --disk-only

   *** write something to the disk inside the guest ***

   $ virsh blockcommit vm vda --active --shallow && virsh blockjob vm vda 
--abort
   $ lsof /vzt/vm.qcow2
   COMMAND  PID USER   FD   TYPE DEVICE   SIZE/OFF NODE NAME
   qemu-syst 433203 root   45u   REG  253,0 1724776448  133 /vzt/vm.qcow2
   $ cat /proc/433203/fdinfo/45
   pos:0
   flags:  02140002 < The last 2 means RW mode

If the base image is in RW mode at the end of blockcommit and was in RO
mode before blockcommit, reopen the base BDS in RO.

Signed-off-by: Alexander Ivanov 
---
  block/mirror.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 1bdce3b657..d23be57255 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -93,6 +93,7 @@ typedef struct MirrorBlockJob {
  int64_t active_write_bytes_in_flight;
  bool prepared;
  bool in_drain;
+bool base_ro;
  } MirrorBlockJob;
  
  typedef struct MirrorBDSOpaque {

@@ -797,6 +798,10 @@ static int mirror_exit_common(Job *job)
  bdrv_drained_end(target_bs);
  bdrv_unref(target_bs);
  
+if (abort && s->base_ro && !bdrv_is_read_only(target_bs)) {

+bdrv_reopen_set_read_only(target_bs, true, NULL);
+}
+


All looks good to me except this: seems it is safer to place this "if" block before 
"bdrv_drained_end(); bdrv_unref();" above. With it moved:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


  bs_opaque->job = NULL;
  
  bdrv_drained_end(src);

@@ -1717,6 +1722,7 @@ static BlockJob *mirror_start_job(
   bool is_none_mode, BlockDriverState *base,
   bool auto_complete, const char *filter_node_name,
   bool is_mirror, MirrorCopyMode copy_mode,
+ bool base_ro,
   Error **errp)
  {
  MirrorBlockJob *s;
@@ -1800,6 +1806,7 @@ static BlockJob *mirror_start_job(
  bdrv_unref(mirror_top_bs);
  
  s->mirror_top_bs = mirror_top_bs;

+s->base_ro = base_ro;
  
  /* No resize for the target either; while the mirror is still running, a

   * consistent read isn't necessarily possible. We could possibly allow
@@ -2029,7 +2036,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
   speed, granularity, buf_size, backing_mode, zero_target,
   on_source_error, on_target_error, unmap, NULL, NULL,
   _job_driver, is_none_mode, base, false,
- filter_node_name, true, copy_mode, errp);
+ filter_node_name, true, copy_mode, false, errp);
  }
  
  BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,

@@ -2058,7 +2065,7 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
   on_error, on_error, true, cb, opaque,
   _active_job_driver, false, base, auto_complete,
   filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
- errp);
+ base_read_only, errp);
  if (!job) {
  goto error_restore_flags;
  }


--
Best regards,
Vladimir




Re: [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device

2024-03-29 Thread Michael S. Tsirkin
On Thu, Mar 28, 2024 at 02:02:52PM +0100, Philippe Mathieu-Daudé wrote:
> Since v1:
> - split in 3 (Thomas)
> - justify gluster removal


Reviewed-by: Michael S. Tsirkin 

> Philippe Mathieu-Daudé (3):
>   hw/rdma: Remove pvrdma device and rdmacm-mux helper
>   migration: Remove RDMA protocol handling
>   block/gluster: Remove RDMA protocol handling
> 
>  MAINTAINERS   |   17 -
>  docs/about/deprecated.rst |9 -
>  docs/about/removed-features.rst   |4 +
>  docs/devel/migration/main.rst |6 -
>  docs/pvrdma.txt   |  345 --
>  docs/rdma.txt |  420 --
>  docs/system/device-url-syntax.rst.inc |4 +-
>  docs/system/loongarch/virt.rst|2 +-
>  docs/system/qemu-block-drivers.rst.inc|1 -
>  meson.build   |   59 -
>  qapi/machine.json |   17 -
>  qapi/migration.json   |   31 +-
>  qapi/qapi-schema.json |1 -
>  qapi/rdma.json|   38 -
>  contrib/rdmacm-mux/rdmacm-mux.h   |   61 -
>  hw/rdma/rdma_backend.h|  129 -
>  hw/rdma/rdma_backend_defs.h   |   76 -
>  hw/rdma/rdma_rm.h |   97 -
>  hw/rdma/rdma_rm_defs.h|  146 -
>  hw/rdma/rdma_utils.h  |   63 -
>  hw/rdma/trace.h   |1 -
>  hw/rdma/vmw/pvrdma.h  |  144 -
>  hw/rdma/vmw/pvrdma_dev_ring.h |   46 -
>  hw/rdma/vmw/pvrdma_qp_ops.h   |   28 -
>  hw/rdma/vmw/trace.h   |1 -
>  include/hw/rdma/rdma.h|   37 -
>  include/monitor/hmp.h |1 -
>  .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h |  685 ---
>  .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.h   |  348 --
>  .../standard-headers/rdma/vmw_pvrdma-abi.h|  310 --
>  migration/migration-stats.h   |6 +-
>  migration/migration.h |9 -
>  migration/options.h   |2 -
>  migration/rdma.h  |   69 -
>  block/gluster.c   |   39 -
>  contrib/rdmacm-mux/main.c |  831 
>  hw/core/machine-qmp-cmds.c|   32 -
>  hw/rdma/rdma.c|   30 -
>  hw/rdma/rdma_backend.c| 1401 --
>  hw/rdma/rdma_rm.c |  812 
>  hw/rdma/rdma_utils.c  |  126 -
>  hw/rdma/vmw/pvrdma_cmd.c  |  815 
>  hw/rdma/vmw/pvrdma_dev_ring.c |  141 -
>  hw/rdma/vmw/pvrdma_main.c |  735 ---
>  hw/rdma/vmw/pvrdma_qp_ops.c   |  298 --
>  migration/migration-stats.c   |5 +-
>  migration/migration.c |   31 -
>  migration/options.c   |   16 -
>  migration/qemu-file.c |1 -
>  migration/ram.c   |   86 +-
>  migration/rdma.c  | 4184 -
>  migration/savevm.c|2 +-
>  monitor/qmp-cmds.c|1 -
>  Kconfig.host  |3 -
>  contrib/rdmacm-mux/meson.build|7 -
>  hmp-commands-info.hx  |   13 -
>  hw/Kconfig|1 -
>  hw/meson.build|1 -
>  hw/rdma/Kconfig   |3 -
>  hw/rdma/meson.build   |   12 -
>  hw/rdma/trace-events  |   31 -
>  hw/rdma/vmw/trace-events  |   17 -
>  meson_options.txt |4 -
>  migration/meson.build |1 -
>  migration/trace-events|   68 +-
>  qapi/meson.build  |1 -
>  qemu-options.hx   |6 -
>  .../org.centos/stream/8/build-environment.yml |1 -
>  .../ci/org.centos/stream/8/x86_64/configure   |3 -
>  scripts/ci/setup/build-environment.yml|4 -
>  scripts/coverity-scan/run-coverity-scan   |2 +-
>  scripts/meson-buildoptions.sh |6 -
>  scripts/update-linux-headers.sh   |   27 -
>  tests/lcitool/projects/qemu.yml   |3 -
>  tests/migration/guestperf/engine.py   |4 +-
>  75 files changed, 20 insertions(+), 12997 deletions(-)
>  delete mode 100644 docs/pvrdma.txt
>  delete mode 100644 docs/rdma.txt
>  delete mode 100644 qapi/rdma.json
>  delete mode 100644 contrib/rdmacm-mux/rdmacm-mux.h
>  delete mode 100644 

Re: [PATCH 06/19] block/stream: fix -Werror=maybe-uninitialized false-positives

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 28.03.24 13:20, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

../block/stream.c:193:19: error: ‘unfiltered_bs’ may be used uninitialized 
[-Werror=maybe-uninitialized]
../block/stream.c:176:5: error: ‘len’ may be used uninitialized 
[-Werror=maybe-uninitialized]
trace/trace-block.h:906:9: error: ‘ret’ may be used uninitialized 
[-Werror=maybe-uninitialized]

Signed-off-by: Marc-André Lureau 


Again, same false-positives, because of WITH_GRAPH_RDLOCK_GUARD()..

Didn't you try to change WITH_ macros somehow, so that compiler believe in our 
good intentions?

Actually, "unused variable initialization" is bad thing too.

Anyway, if no better solution for now:
Acked-by: Vladimir Sementsov-Ogievskiy 


---
  block/stream.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 7031eef12b..9076203193 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -155,8 +155,8 @@ static void stream_clean(Job *job)
  static int coroutine_fn stream_run(Job *job, Error **errp)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockDriverState *unfiltered_bs;
-int64_t len;
+BlockDriverState *unfiltered_bs = NULL;
+int64_t len = -1;
  int64_t offset = 0;
  int error = 0;
  int64_t n = 0; /* bytes */
@@ -177,7 +177,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
  
  for ( ; offset < len; offset += n) {

  bool copy;
-int ret;
+int ret = -1;
  
  /* Note that even when no rate limit is applied we need to yield

   * with no pending I/O here so that bdrv_drain_all() returns.


--
Best regards,
Vladimir




Re: [PATCH 05/19] block/mirror: fix -Werror=maybe-uninitialized false-positive

2024-03-29 Thread Vladimir Sementsov-Ogievskiy

On 28.03.24 13:20, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

../block/mirror.c:1066:22: error: ‘iostatus’ may be used uninitialized 
[-Werror=maybe-uninitialized]


Actually that's a false-positive.. Compiler can't believe that body of 
WITH_JOB_LOCK_GUARD() will be executed unconditionally. Probably we should 
mention this in a comment.



Signed-off-by: Marc-André Lureau 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  block/mirror.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 1bdce3b657..53dd7332ee 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -926,7 +926,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  MirrorBDSOpaque *mirror_top_opaque = s->mirror_top_bs->opaque;
  BlockDriverState *target_bs = blk_bs(s->target);
  bool need_drain = true;
-BlockDeviceIoStatus iostatus;
+BlockDeviceIoStatus iostatus = BLOCK_DEVICE_IO_STATUS__MAX;
  int64_t length;
  int64_t target_length;
  BlockDriverInfo bdi;


--
Best regards,
Vladimir




Re: [PATCH 3/3] ffvat: Fix reading files with non-continuous clusters

2024-03-29 Thread Amjad Alsharafi
I noticed the issue in the commit message 'ffvat' should be 'vvfat',
I'll fix it in the next version.

On Thu, Mar 28, 2024 at 04:11:27AM +0800, Amjad Alsharafi wrote:
> When reading with `read_cluster` we get the `mapping` with
> `find_mapping_for_cluster` and then we call `open_file` for this
> mapping.
> The issue appear when its the same file, but a second cluster that is
> not immediately after it, imagine clusters `500 -> 503`, this will give
> us 2 mappings one has the range `500..501` and another `503..504`, both
> point to the same file, but different offsets.
> 
> When we don't open the file since the path is the same, we won't assign
> `s->current_mapping` and thus accessing way out of bound of the file.
> 
> From our example above, after `open_file` (that didn't open anything) we
> will get the offset into the file with
> `s->cluster_size*(cluster_num-s->current_mapping->begin)`, which will
> give us `0x2000 * (504-500)`, which is out of bound for this mapping and
> will produce some issues.
> 
> Signed-off-by: Amjad Alsharafi 
> ---
>  block/vvfat.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/block/vvfat.c b/block/vvfat.c
> index cb3ab81e29..87165abc26 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -1360,15 +1360,22 @@ static int open_file(BDRVVVFATState* s,mapping_t* 
> mapping)
>  {
>  if(!mapping)
>  return -1;
> +int new_path = 1;
>  if(!s->current_mapping ||
> -strcmp(s->current_mapping->path,mapping->path)) {
> -/* open file */
> -int fd = qemu_open_old(mapping->path,
> +
> s->current_mapping->first_mapping_index!=mapping->first_mapping_index ||
> +(new_path = strcmp(s->current_mapping->path,mapping->path))) {
> +
> +if (new_path) {
> +/* open file */
> +int fd = qemu_open_old(mapping->path,
> O_RDONLY | O_BINARY | O_LARGEFILE);
> -if(fd<0)
> -return -1;
> -vvfat_close_current_file(s);
> -s->current_fd = fd;
> +if(fd<0)
> +return -1;
> +vvfat_close_current_file(s);
> +
> +s->current_fd = fd;
> +}
> +assert(s->current_fd);
>  s->current_mapping = mapping;
>  }
>  return 0;
> -- 
> 2.44.0
>