Re: [PATCH 02/12] hw/vfio/pci: Replace sprintf() by g_strdup_printf()
On 4/12/24 17:25, Alex Williamson wrote: On Wed, 10 Apr 2024 18:06:03 +0200 Philippe Mathieu-Daudé wrote: sprintf() is deprecated on Darwin since macOS 13.0 / XCode 14.1, resulting in painful developper experience. Use g_strdup_printf() instead. Isn't this code only compiled for Linux hosts? It is not. Maybe still a valid change, but the rationale seems irrelevant. I agree the commit log should be rephrased. There is also a v2 doing a different change : https://lore.kernel.org/qemu-devel/20240411101550.99392-1-phi...@linaro.org/ This is a bit confusing. Thanks, C.
[PATCH for-9.1] migration: Add Error** argument to add_bitmaps_to_list()
This allows to report more precise errors in the migration handler dirty_bitmap_save_setup(). Suggested-by Vladimir Sementsov-Ogievskiy Signed-off-by: Cédric Le Goater --- To apply on top of : https://lore.kernel.org/qemu-devel/20240320064911.545001-1-...@redhat.com/ migration/block-dirty-bitmap.c | 34 -- 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c index 542a8c297b329abc30d1b3a205d29340fa59a961..a7d55048c23505fde565ca784cec3c917dca37e5 100644 --- a/migration/block-dirty-bitmap.c +++ b/migration/block-dirty-bitmap.c @@ -481,13 +481,13 @@ static void dirty_bitmap_do_save_cleanup(DBMSaveState *s) /* Called with the BQL taken. */ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs, - const char *bs_name, GHashTable *alias_map) + const char *bs_name, GHashTable *alias_map, + Error **errp) { BdrvDirtyBitmap *bitmap; SaveBitmapState *dbms; GHashTable *bitmap_aliases; const char *node_alias, *bitmap_name, *bitmap_alias; -Error *local_err = NULL; /* When an alias map is given, @bs_name must be @bs's node name */ assert(!alias_map || !strcmp(bs_name, bdrv_get_node_name(bs))); @@ -504,8 +504,8 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs, bitmap_name = bdrv_dirty_bitmap_name(bitmap); if (!bs_name || strcmp(bs_name, "") == 0) { -error_report("Bitmap '%s' in unnamed node can't be migrated", - bitmap_name); +error_setg(errp, "Bitmap '%s' in unnamed node can't be migrated", + bitmap_name); return -1; } @@ -525,9 +525,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs, } if (node_alias[0] == '#') { -error_report("Bitmap '%s' in a node with auto-generated " - "name '%s' can't be migrated", - bitmap_name, node_alias); +error_setg(errp, "Bitmap '%s' in a node with auto-generated " + "name '%s' can't be migrated", + bitmap_name, node_alias); return -1; } @@ -538,8 +538,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs, continue; } -if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, _err)) { -error_report_err(local_err); +if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) { return -1; } @@ -558,9 +557,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs, } } else { if (strlen(bitmap_name) > UINT8_MAX) { -error_report("Cannot migrate bitmap '%s' on node '%s': " - "Name is longer than %u bytes", - bitmap_name, bs_name, UINT8_MAX); +error_setg(errp, "Cannot migrate bitmap '%s' on node '%s': " + "Name is longer than %u bytes", + bitmap_name, bs_name, UINT8_MAX); return -1; } bitmap_alias = bitmap_name; @@ -599,7 +598,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs, } /* Called with the BQL taken. */ -static int init_dirty_bitmap_migration(DBMSaveState *s) +static int init_dirty_bitmap_migration(DBMSaveState *s, Error **errp) { BlockDriverState *bs; SaveBitmapState *dbms; @@ -643,7 +642,7 @@ static int init_dirty_bitmap_migration(DBMSaveState *s) } if (bs && bs->drv && !bs->drv->is_filter) { -if (add_bitmaps_to_list(s, bs, name, NULL)) { +if (add_bitmaps_to_list(s, bs, name, NULL, errp)) { goto fail; } g_hash_table_add(handled_by_blk, bs); @@ -656,7 +655,8 @@ static int init_dirty_bitmap_migration(DBMSaveState *s) continue; } -if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs), alias_map)) { +if (add_bitmaps_to_list(s, bs, bdrv_get_node_name(bs), alias_map, +errp)) { goto fail; } } @@ -1218,9 +1218,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp) DBMSaveState *s = &((DBMState *)opaque)->save; SaveBitmapState *dbms = NULL; -if (init_dirty_bitmap_migration(s) < 0) { -error_setg(errp, - "Failed to initialize dirty tracking bitmap for blocks"); +if (init_dirty_bitmap_migration(s, errp) < 0) { return -1; } -- 2.44.0
Re: [PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler
Hello Vladimir, On 3/29/24 10:32, Vladimir Sementsov-Ogievskiy wrote: On 20.03.24 09:49, Cédric Le Goater wrote: diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c index 2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961 100644 --- a/migration/block-dirty-bitmap.c +++ b/migration/block-dirty-bitmap.c @@ -1213,12 +1213,14 @@ fail: return ret; } -static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp) { DBMSaveState *s = &((DBMState *)opaque)->save; SaveBitmapState *dbms = NULL; if (init_dirty_bitmap_migration(s) < 0) { + error_setg(errp, + "Failed to initialize dirty tracking bitmap for blocks"); No, that's not about initializing a bitmap. This all is about migration of block-dirty-bitmaps themselves. So correct would be say "Failed to initialize migration of block dirty bitmaps". with this, for block dirty bitmap migration: Acked-by: Vladimir Sementsov-Ogievskiy I had kept your previous R-b. Should we remove it ? or is it ok if I address your comments below in a followup patch, in which case the error message above would be removed. Still, a lot better is add errp to init_dirty_bitmap_migration() and to add_bitmaps_to_list() too: look, init_dirty_bitmap_migration() fails only if add_bitmaps_to_list() fails in turn, add_bitmaps_to_list() have several clear failure points, where it always does error_report (or error_report_err), which would be better to pass-through to the user. Good idea. Will do. Thanks, C.
[PATCH for-9.1 v5 07/14] migration: Add Error** argument to .save_setup() handler
The purpose is to record a potential error in the migration stream if qemu_savevm_state_setup() fails. Most of the current .save_setup() handlers can be modified to use the Error argument instead of managing their own and calling locally error_report(). Cc: Nicholas Piggin Cc: Harsh Prateek Bora Cc: Halil Pasic Cc: Thomas Huth Cc: Eric Blake Cc: Vladimir Sementsov-Ogievskiy Cc: John Snow Cc: Stefan Hajnoczi Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Xu Reviewed-by: Thomas Huth Reviewed-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Cédric Le Goater --- include/migration/register.h | 3 ++- hw/ppc/spapr.c | 2 +- hw/s390x/s390-stattrib.c | 6 ++ hw/vfio/migration.c| 17 - migration/block-dirty-bitmap.c | 4 +++- migration/block.c | 13 - migration/ram.c| 15 --- migration/savevm.c | 4 +--- 8 files changed, 29 insertions(+), 35 deletions(-) diff --git a/include/migration/register.h b/include/migration/register.h index d7b70a8be68c9df47c7843bda7d430989d7ca384..64fc7c11036c82edd6d69513e56a0216d36c17aa 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -60,10 +60,11 @@ typedef struct SaveVMHandlers { * * @f: QEMUFile where to send the data * @opaque: data pointer passed to register_savevm_live() + * @errp: pointer to Error*, to store an error if it happens. * * Returns zero to indicate success and negative for error */ -int (*save_setup)(QEMUFile *f, void *opaque); +int (*save_setup)(QEMUFile *f, void *opaque, Error **errp); /** * @save_cleanup diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c417f9dd523547eabf6d66a8f505093758e80461..144a3f2b604872e09268b509b9b79ee5b2226136 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2171,7 +2171,7 @@ static const VMStateDescription vmstate_spapr = { } }; -static int htab_save_setup(QEMUFile *f, void *opaque) +static int htab_save_setup(QEMUFile *f, void *opaque, Error **errp) { SpaprMachineState *spapr = opaque; diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c index b743e8a2fee84c7374460ccea6df1cf447cda44b..bc04187b2b69226db80219da1a964a87428adc0c 100644 --- a/hw/s390x/s390-stattrib.c +++ b/hw/s390x/s390-stattrib.c @@ -168,19 +168,17 @@ static int cmma_load(QEMUFile *f, void *opaque, int version_id) return ret; } -static int cmma_save_setup(QEMUFile *f, void *opaque) +static int cmma_save_setup(QEMUFile *f, void *opaque, Error **errp) { S390StAttribState *sas = S390_STATTRIB(opaque); S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas); -Error *local_err = NULL; int res; /* * Signal that we want to start a migration, thus needing PGSTE dirty * tracking. */ -res = sac->set_migrationmode(sas, true, _err); +res = sac->set_migrationmode(sas, true, errp); if (res) { -error_report_err(local_err); return res; } qemu_put_be64(f, STATTR_FLAG_EOS); diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index bf5a29ddc15b0dbc7ae9c44f289539dd0cdddb0d..5763c0b68376b1e24ef3e77c3d19fcd406922c79 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -376,7 +376,7 @@ static int vfio_save_prepare(void *opaque, Error **errp) return 0; } -static int vfio_save_setup(QEMUFile *f, void *opaque) +static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp) { VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; @@ -390,8 +390,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque) stop_copy_size); migration->data_buffer = g_try_malloc0(migration->data_buffer_size); if (!migration->data_buffer) { -error_report("%s: Failed to allocate migration data buffer", - vbasedev->name); +error_setg(errp, "%s: Failed to allocate migration data buffer", + vbasedev->name); return -ENOMEM; } @@ -401,8 +401,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque) ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY, VFIO_DEVICE_STATE_RUNNING); if (ret) { -error_report("%s: Failed to set new PRE_COPY state", - vbasedev->name); +error_setg(errp, "%s: Failed to set new PRE_COPY state", + vbasedev->name); return ret; } @@ -413,8 +413,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque) /* vfio_save_complete_precopy() will go to STOP_COPY */ break; default: -error_report("%s: Invalid device state %d", vbasedev->name, -
[PATCH for-9.1 v5 03/14] migration: Always report an error in block_save_setup()
This will prepare ground for future changes adding an Error** argument to the save_setup() handler. We need to make sure that on failure, block_save_setup() always sets a new error. Cc: Stefan Hajnoczi Reviewed-by: Fabiano Rosas Signed-off-by: Cédric Le Goater --- Changes in v5: - Rebased on 2e128776dc56 ("migration: Skip only empty block devices") migration/block.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/migration/block.c b/migration/block.c index 2b9054889ad2ba739828594c50cf047703757e96..f8a11beb37dac3df5c2cc654db6440509d1181ea 100644 --- a/migration/block.c +++ b/migration/block.c @@ -367,7 +367,7 @@ static void unset_dirty_tracking(void) } } -static int init_blk_migration(QEMUFile *f) +static int init_blk_migration(QEMUFile *f, Error **errp) { BlockDriverState *bs; BlkMigDevState *bmds; @@ -378,7 +378,6 @@ static int init_blk_migration(QEMUFile *f) BlkMigDevState *bmds; BlockDriverState *bs; } *bmds_bs; -Error *local_err = NULL; int ret; GRAPH_RDLOCK_GUARD_MAINLOOP(); @@ -406,6 +405,8 @@ static int init_blk_migration(QEMUFile *f) continue; } if (sectors < 0) { +error_setg(errp, "Error getting length of block device %s", + bdrv_get_device_name(bs)); ret = sectors; bdrv_next_cleanup(); goto out; @@ -442,9 +443,8 @@ static int init_blk_migration(QEMUFile *f) bs = bmds_bs[i].bs; if (bmds) { -ret = blk_insert_bs(bmds->blk, bs, _err); +ret = blk_insert_bs(bmds->blk, bs, errp); if (ret < 0) { -error_report_err(local_err); goto out; } @@ -714,6 +714,7 @@ static void block_migration_cleanup(void *opaque) static int block_save_setup(QEMUFile *f, void *opaque) { int ret; +Error *local_err = NULL; trace_migration_block_save("setup", block_mig_state.submitted, block_mig_state.transferred); @@ -721,18 +722,27 @@ static int block_save_setup(QEMUFile *f, void *opaque) warn_report("block migration is deprecated;" " use blockdev-mirror with NBD instead"); -ret = init_blk_migration(f); +ret = init_blk_migration(f, _err); if (ret < 0) { +error_report_err(local_err); return ret; } /* start track dirty blocks */ ret = set_dirty_tracking(); if (ret) { +error_setg_errno(_err, -ret, + "Failed to start block dirty tracking"); +error_report_err(local_err); return ret; } ret = flush_blks(f); +if (ret) { +error_setg_errno(_err, -ret, "Flushing block failed"); +error_report_err(local_err); +return ret; +} blk_mig_reset_dirty_cursor(); qemu_put_be64(f, BLK_MIG_FLAG_EOS); -- 2.44.0
[PATCH] aspeed/smc: Only wire flash devices at reset
The Aspeed machines have many Static Memory Controllers (SMC), up to 8, which can only drive flash memory devices. Commit 27a2c66c92ec ("aspeed/smc: Wire CS lines at reset") tried to ease the definitions of these devices by allowing flash devices from the command line to be attached to a SSI bus. For that, the wiring of the CS lines of the Aspeed SMC controller was moved at reset. Two assumptions are made though, first that the device has a SSI_GPIO_CS GPIO line, which is not always the case, and second that it is flash device. Correct this problem by ensuring that the devices attached to the bus are the correct flash type. This fixes a QEMU abort when devices without a CS line, such as the max111x, are passed on the command line. While at it, export TYPE_M25P80 used in the Xilinx Versal Virtual machine. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2228 Fixes: 27a2c66c92ec ("aspeed/smc: Wire CS lines at reset") Reported-by: Thomas Huth Signed-off-by: Cédric Le Goater --- include/hw/block/flash.h | 2 ++ hw/arm/xlnx-versal-virt.c | 3 ++- hw/block/m25p80.c | 1 - hw/ssi/aspeed_smc.c | 9 + 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/hw/block/flash.h b/include/hw/block/flash.h index de93756cbe8f..2b5ccd92f463 100644 --- a/include/hw/block/flash.h +++ b/include/hw/block/flash.h @@ -78,6 +78,8 @@ extern const VMStateDescription vmstate_ecc_state; /* m25p80.c */ +#define TYPE_M25P80 "m25p80-generic" + BlockBackend *m25p80_get_blk(DeviceState *dev); #endif diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c index bfaed1aebfc6..962f98fee2ea 100644 --- a/hw/arm/xlnx-versal-virt.c +++ b/hw/arm/xlnx-versal-virt.c @@ -13,6 +13,7 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "sysemu/device_tree.h" +#include "hw/block/flash.h" #include "hw/boards.h" #include "hw/sysbus.h" #include "hw/arm/fdt.h" @@ -759,7 +760,7 @@ static void versal_virt_init(MachineState *machine) flash_klass = object_class_by_name(s->ospi_model); if (!flash_klass || object_class_is_abstract(flash_klass) || -!object_class_dynamic_cast(flash_klass, "m25p80-generic")) { +!object_class_dynamic_cast(flash_klass, TYPE_M25P80)) { error_setg(_fatal, "'%s' is either abstract or" " not a subtype of m25p80", s->ospi_model); return; diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 08a00a6d9b89..8dec134832a1 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -515,7 +515,6 @@ struct M25P80Class { FlashPartInfo *pi; }; -#define TYPE_M25P80 "m25p80-generic" OBJECT_DECLARE_TYPE(Flash, M25P80Class, M25P80) static inline Manufacturer get_man(Flash *s) diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index 0dff1d910778..de57e690e124 100644 --- a/hw/ssi/aspeed_smc.c +++ b/hw/ssi/aspeed_smc.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "hw/block/flash.h" #include "hw/sysbus.h" #include "migration/vmstate.h" #include "qemu/log.h" @@ -711,6 +712,14 @@ static void aspeed_smc_reset(DeviceState *d) for (i = 0; i < asc->cs_num_max; i++) { DeviceState *dev = ssi_get_cs(s->spi, i); if (dev) { +Object *o = OBJECT(dev); + +if (!object_dynamic_cast(o, TYPE_M25P80)) { +warn_report("Aspeed SMC %s.%d : Invalid %s device type", +BUS(s->spi)->name, i, object_get_typename(o)); +continue; +} + qemu_irq cs_line = qdev_get_gpio_in_named(dev, SSI_GPIO_CS, 0); qdev_connect_gpio_out_named(DEVICE(s), "cs", i, cs_line); } -- 2.44.0
Re: [PATCH for 9.0] migration: Skip only empty block devices
On 3/12/24 19:41, Stefan Hajnoczi wrote: On Tue, Mar 12, 2024 at 01:04:31PM +0100, Cédric Le Goater wrote: The block .save_setup() handler calls a helper routine init_blk_migration() which builds a list of block devices to take into account for migration. When one device is found to be empty (sectors == 0), the loop exits and all the remaining devices are ignored. This is a regression introduced when bdrv_iterate() was removed. Change that by skipping only empty devices. Cc: Markus Armbruster Suggested: Kevin Wolf Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()") It's not clear to me that fea68bb6e9fa introduced the bug. The code is still <= 0 there and I don't see anything else that skips empty devices. Can you explain the bug in fea68bb6e9fa? Let me try. Initially, the code was : static void init_blk_migration_it(void *opaque, BlockDriverState *bs) { BlockDriverState *bs; BlkMigDevState *bmds; int64_t sectors; if (!bdrv_is_read_only(bs)) { sectors = bdrv_nb_sectors(bs); if (sectors <= 0) { return; ... } static void init_blk_migration(QEMUFile *f) { block_mig_state.submitted = 0; block_mig_state.read_done = 0; block_mig_state.transferred = 0; block_mig_state.total_sector_sum = 0; block_mig_state.prev_progress = -1; block_mig_state.bulk_completed = 0; block_mig_state.zero_blocks = migrate_zero_blocks(); bdrv_iterate(init_blk_migration_it, NULL); } bdrv_iterate() was iterating on all devices and exiting one iteration early if the device was empty or an error detected. The loop applied on all devices always, only empty devices and the ones with a failure were skipped. It was replaced by : static void init_blk_migration(QEMUFile *f) { BlockDriverState *bs; BlkMigDevState *bmds; int64_t sectors; block_mig_state.submitted = 0; block_mig_state.read_done = 0; block_mig_state.transferred = 0; block_mig_state.total_sector_sum = 0; block_mig_state.prev_progress = -1; block_mig_state.bulk_completed = 0; block_mig_state.zero_blocks = migrate_zero_blocks(); for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) { if (bdrv_is_read_only(bs)) { continue; } sectors = bdrv_nb_sectors(bs); if (sectors <= 0) { return; ... } The loop and function exit at first failure or first empty device, skipping the remaining devices. This is a different behavior. What we would want today is ignore empty devices, as it done for the read only ones, return at first failure and let the caller of init_blk_migration() report. This is a short term fix for 9.0 because the block migration code is deprecated and should be removed in 9.1. Thanks, C.
Re: [PATCH for 9.0] migration: Skip only empty block devices
On 3/12/24 13:04, Cédric Le Goater wrote: The block .save_setup() handler calls a helper routine init_blk_migration() which builds a list of block devices to take into account for migration. When one device is found to be empty (sectors == 0), the loop exits and all the remaining devices are ignored. This is a regression introduced when bdrv_iterate() was removed. Change that by skipping only empty devices. Cc: Markus Armbruster Suggested: Kevin Wolf That's better : Suggested-by: Kevin Wolf Sorry for the noise, C. Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()") Signed-off-by: Cédric Le Goater --- migration/block.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/migration/block.c b/migration/block.c index 8c6ebafacc1ffe930d1d4f19d968817b14852c69..2b9054889ad2ba739828594c50cf047703757e96 100644 --- a/migration/block.c +++ b/migration/block.c @@ -402,7 +402,10 @@ static int init_blk_migration(QEMUFile *f) } sectors = bdrv_nb_sectors(bs); -if (sectors <= 0) { +if (sectors == 0) { +continue; +} +if (sectors < 0) { ret = sectors; bdrv_next_cleanup(); goto out;
[PATCH for 9.0] migration: Skip only empty block devices
The block .save_setup() handler calls a helper routine init_blk_migration() which builds a list of block devices to take into account for migration. When one device is found to be empty (sectors == 0), the loop exits and all the remaining devices are ignored. This is a regression introduced when bdrv_iterate() was removed. Change that by skipping only empty devices. Cc: Markus Armbruster Suggested: Kevin Wolf Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()") Signed-off-by: Cédric Le Goater --- migration/block.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/migration/block.c b/migration/block.c index 8c6ebafacc1ffe930d1d4f19d968817b14852c69..2b9054889ad2ba739828594c50cf047703757e96 100644 --- a/migration/block.c +++ b/migration/block.c @@ -402,7 +402,10 @@ static int init_blk_migration(QEMUFile *f) } sectors = bdrv_nb_sectors(bs); -if (sectors <= 0) { +if (sectors == 0) { +continue; +} +if (sectors < 0) { ret = sectors; bdrv_next_cleanup(); goto out; -- 2.44.0
Re: [PATCH v3 03/26] migration: Always report an error in block_save_setup()
On 3/4/24 22:04, Fabiano Rosas wrote: Cédric Le Goater writes: This will prepare ground for futur changes adding an Error** argument to the save_setup() handler. We need to make sure that on failure, block_save_setup() always sets a new error. Cc: Stefan Hajnoczi Signed-off-by: Cédric Le Goater --- migration/block.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/migration/block.c b/migration/block.c index 8c6ebafacc1ffe930d1d4f19d968817b14852c69..06f5857cf049df3261d2cf1d7c3607ab92350ac6 100644 --- a/migration/block.c +++ b/migration/block.c @@ -367,7 +367,7 @@ static void unset_dirty_tracking(void) } } -static int init_blk_migration(QEMUFile *f) +static int init_blk_migration(QEMUFile *f, Error **errp) { BlockDriverState *bs; BlkMigDevState *bmds; @@ -378,7 +378,6 @@ static int init_blk_migration(QEMUFile *f) BlkMigDevState *bmds; BlockDriverState *bs; } *bmds_bs; -Error *local_err = NULL; int ret; GRAPH_RDLOCK_GUARD_MAINLOOP(); There's a negative return below at: for (i = 0, bs = bdrv_first(); bs; bs = bdrv_next(), i++) { if (bdrv_is_read_only(bs)) { continue; } sectors = bdrv_nb_sectors(bs); if (sectors <= 0) { ret = sectors; ^ bdrv_next_cleanup(); goto out; } ... Indeed. I understand that the bdrv_nb_sectors() == 0 case only breaks the loop but it is not considered as an error. Could the block folks confirm please ? Thanks, C. I presume that must be covered by an error as well. A similar function somewhere else reports: total_sectors = blk_nb_sectors(blk); if (total_sectors <= 0) { error_report("Error getting length of block device %s", device_name); return -EINVAL; } @@ -439,9 +438,8 @@ static int init_blk_migration(QEMUFile *f) bs = bmds_bs[i].bs; if (bmds) { -ret = blk_insert_bs(bmds->blk, bs, _err); +ret = blk_insert_bs(bmds->blk, bs, errp); if (ret < 0) { -error_report_err(local_err); goto out; } @@ -711,6 +709,7 @@ static void block_migration_cleanup(void *opaque) static int block_save_setup(QEMUFile *f, void *opaque) { int ret; +Error *local_err = NULL; trace_migration_block_save("setup", block_mig_state.submitted, block_mig_state.transferred); @@ -718,18 +717,27 @@ static int block_save_setup(QEMUFile *f, void *opaque) warn_report("block migration is deprecated;" " use blockdev-mirror with NBD instead"); -ret = init_blk_migration(f); +ret = init_blk_migration(f, _err); if (ret < 0) { +error_report_err(local_err); return ret; } /* start track dirty blocks */ ret = set_dirty_tracking(); if (ret) { +error_setg_errno(_err, -ret, + "Failed to start block dirty tracking"); +error_report_err(local_err); return ret; } ret = flush_blks(f); +if (ret) { +error_setg_errno(_err, -ret, "Flushing block failed"); +error_report_err(local_err); +return ret; +} blk_mig_reset_dirty_cursor(); qemu_put_be64(f, BLK_MIG_FLAG_EOS);
Re: [PATCH 05/21] hw/ppc/pnv_bmc: Use qdev_new() instead of QOM API
On 2/16/24 12:02, Philippe Mathieu-Daudé wrote: Prefer QDev API for QDev objects, avoid the underlying QOM layer. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Thanks, C. --- hw/ppc/pnv_bmc.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/ppc/pnv_bmc.c b/hw/ppc/pnv_bmc.c index 99f1e8d7f9..0c1274df21 100644 --- a/hw/ppc/pnv_bmc.c +++ b/hw/ppc/pnv_bmc.c @@ -269,13 +269,13 @@ void pnv_bmc_set_pnor(IPMIBmc *bmc, PnvPnor *pnor) */ IPMIBmc *pnv_bmc_create(PnvPnor *pnor) { -Object *obj; +DeviceState *dev; -obj = object_new(TYPE_IPMI_BMC_SIMULATOR); -qdev_realize(DEVICE(obj), NULL, _fatal); -pnv_bmc_set_pnor(IPMI_BMC(obj), pnor); +dev = qdev_new(TYPE_IPMI_BMC_SIMULATOR); +qdev_realize(dev, NULL, _fatal); +pnv_bmc_set_pnor(IPMI_BMC(dev), pnor); -return IPMI_BMC(obj); +return IPMI_BMC(dev); } typedef struct ForeachArgs {
Re: [PATCH v2 2/6] vfio: Avoid inspecting option QDict for rombar
On 2/12/24 09:04, Philippe Mathieu-Daudé wrote: On 10/2/24 11:24, Akihiko Odaki wrote: Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly enabled. Signed-off-by: Akihiko Odaki --- hw/vfio/pci.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d7fe06715c4b..44178ac9355f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1010,7 +1010,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev) { uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK); off_t offset = vdev->config_offset + PCI_ROM_ADDRESS; - DeviceState *dev = DEVICE(vdev); char *name; int fd = vdev->vbasedev.fd; @@ -1044,7 +1043,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev) } if (vfio_opt_rom_in_denylist(vdev)) { - if (dev->opts && qdict_haskey(dev->opts, "rombar")) { + if (pci_rom_bar_explicitly_enabled(>pdev)) { "pdev" is considered internal field, please use the DEVICE() macro to access it. Yes. I was just looking at vfio_pci_size_rom(). There is a test at the beginning of this routine which should be changed to use DEVICE() if (vdev->pdev.romfile || !vdev->pdev.rom_bar) { /* Since pci handles romfile, just print a message and return */ if (vfio_opt_rom_in_denylist(vdev) && vdev->pdev.romfile) { ... Thanks, C.
Re: [PATCH 1/2] target/sh4: Deprecate the shix machine
On 1/8/24 18:15, Samuel Tardieu wrote: The shix machine has been designed and used at Télécom Paris from 2003 to 2010. It had been added to QEMU in 2005 and has not been maintained since. Since nobody is using the physical board anymore nor interested in maintaining the QEMU port, it is time to deprecate it. Signed-off-by: Samuel Tardieu Reviewed-by: Cédric Le Goater Thanks, C. --- docs/about/deprecated.rst | 5 + hw/sh4/shix.c | 1 + 2 files changed, 6 insertions(+) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 2e15040246..e6a12c9077 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -269,6 +269,11 @@ Nios II ``10m50-ghrd`` and ``nios2-generic-nommu`` machines (since 8.2) The Nios II architecture is orphan. +``shix`` (since 9.0) + + +The machine is no longer in existence and has been long unmaintained +in QEMU. Backend options --- diff --git a/hw/sh4/shix.c b/hw/sh4/shix.c index aa812512f0..58530b8ede 100644 --- a/hw/sh4/shix.c +++ b/hw/sh4/shix.c @@ -80,6 +80,7 @@ static void shix_machine_init(MachineClass *mc) mc->init = shix_init; mc->is_default = true; mc->default_cpu_type = TYPE_SH7750R_CPU; +mc->deprecation_reason = "old and unmaintained - use a newer machine instead"; } DEFINE_MACHINE("shix", shix_machine_init)
Re: [PATCH 2/2] hw/block: Deprecate the TC58128 block device
On 1/8/24 18:15, Samuel Tardieu wrote: The 16MiB flash device is only used by the deprecated shix machine. Its code it old and unmaintained, and has never been adapted to the QOM architecture. It still contains debug statements and uses global variables. It is time to deprecate it. Signed-off-by: Samuel Tardieu Reviewed-by: Cédric Le Goater Thanks, C. --- docs/about/deprecated.rst | 2 +- hw/block/tc58128.c| 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index e6a12c9077..15e39f8bbb 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -273,7 +273,7 @@ The Nios II architecture is orphan. The machine is no longer in existence and has been long unmaintained -in QEMU. +in QEMU. This also holds for the TC51828 16MiB flash that it uses. Backend options --- diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c index d350126b27..354c13ccf0 100644 --- a/hw/block/tc58128.c +++ b/hw/block/tc58128.c @@ -202,6 +202,7 @@ static sh7750_io_device tc58128 = { int tc58128_init(struct SH7750State *s, const char *zone1, const char *zone2) { +warn_report_once("The TC58128 flash device is deprecated - use a newer component"); init_dev(_devs[0], zone1); init_dev(_devs[1], zone2); return sh7750_register_io_device(s, );
Re: [PATCH v3 5/5] Rename "QEMU global mutex" to "BQL" in comments and docs
On 1/2/24 16:35, Stefan Hajnoczi wrote: The term "QEMU global mutex" is identical to the more widely used Big QEMU Lock ("BQL"). Update the code comments and documentation to use "BQL" instead of "QEMU global mutex". Signed-off-by: Stefan Hajnoczi Acked-by: Markus Armbruster Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH v3 4/5] Replace "iothread lock" with "BQL" in comments
On 1/2/24 16:35, Stefan Hajnoczi wrote: The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL) in their names. Update the code comments to use "BQL" instead of "iothread lock". Signed-off-by: Stefan Hajnoczi Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH 4/6] system/cpus: rename qemu_global_mutex to qemu_bql
On 11/29/23 22:26, Stefan Hajnoczi wrote: The APIs using qemu_global_mutex now follow the Big QEMU Lock (BQL) nomenclature. It's a little strange that the actual QemuMutex variable that embodies the BQL is called qemu_global_mutex instead of qemu_bql. Rename it for consistency. Signed-off-by: Stefan Hajnoczi Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH 3/6] qemu/main-loop: rename qemu_cond_wait_iothread() to qemu_cond_wait_bql()
On 11/29/23 22:26, Stefan Hajnoczi wrote: The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL) instead, it is already widely used and unambiguous. Signed-off-by: Stefan Hajnoczi Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD
On 11/29/23 22:26, Stefan Hajnoczi wrote: The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL) instead, it is already widely used and unambiguous. Signed-off-by: Stefan Hajnoczi Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()
On 11/29/23 22:26, Stefan Hajnoczi wrote: The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void qemu_bql_lock(void) - void qemu_bql_unlock(void) - bool qemu_bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH v2] block/snapshot: Fix compiler warning with -Wshadow=local
On 10/24/23 13:51, Thomas Huth wrote: On 24/10/2023 12.37, Markus Armbruster wrote: Markus Armbruster writes: Thomas Huth writes: No need to declare a new variable in the the inner code block here, we can re-use the "ret" variable that has been declared at the beginning of the function. With this change, the code can now be successfully compiled with -Wshadow=local again. Fixes: a32e781838 ("Mark bdrv_snapshot_fallback() and callers GRAPH_RDLOCK") Signed-off-by: Thomas Huth --- v2: Assign "ret" only in one spot block/snapshot.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/block/snapshot.c b/block/snapshot.c index 6e16eb803a..55974273ae 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -629,7 +629,6 @@ int bdrv_all_goto_snapshot(const char *name, while (iterbdrvs) { BlockDriverState *bs = iterbdrvs->data; AioContext *ctx = bdrv_get_aio_context(bs); - int ret = 0; bool all_snapshots_includes_bs; aio_context_acquire(ctx); @@ -637,9 +636,8 @@ int bdrv_all_goto_snapshot(const char *name, all_snapshots_includes_bs = bdrv_all_snapshots_includes_bs(bs); bdrv_graph_rdunlock_main_loop(); - if (devices || all_snapshots_includes_bs) { - ret = bdrv_snapshot_goto(bs, name, errp); - } + ret = (devices || all_snapshots_includes_bs) ? + bdrv_snapshot_goto(bs, name, errp) : 0; aio_context_release(ctx); if (ret < 0) { bdrv_graph_rdlock_main_loop(); Better. Unconditional assignment to @ret right before it's checked is how we should use @ret. Reviewed-by: Markus Armbruster And queued. Thanks! Mind if I drop the Fixes: tag? Nothing broken until we enable -Wshadow=local... Fine for me! This looks like the last remaining warning. Are we going to activate -Wshadow=local next ? Thanks, C.
Re: [PATCH 1/1] hw/sd: Declare CPU QOM types using DEFINE_TYPES() macro
On 10/31/23 09:06, Philippe Mathieu-Daudé wrote: When multiple QOM types are registered in the same file, it is simpler to use the the DEFINE_TYPES() macro. In particular because type array declared with such macro are easier to review. Mechanical transformation using the following comby script: [pattern-x1] match=''' static const TypeInfo :[i1~.*_info] = { :[body] }; static void :[rt1~.*_register_type.](void) { type_register_static(&:[i2~.*_info]); } type_init(:[rt2~.*_register_type.]) ''' rewrite=''' static const TypeInfo :[i1][] = { { :[body] }, }; DEFINE_TYPES(:[i1]) ''' rule='where :[i1] == :[i2], :[rt1] == :[rt2]' [pattern-x2] match=''' static const TypeInfo :[i1a~.*_info] = { :[body1] }; ... static const TypeInfo :[i2a~.*_info] = { :[body2] }; static void :[rt1~.*_register_type.](void) { type_register_static(&:[i1b~.*_info]); type_register_static(&:[i2b~.*_info]); } type_init(:[rt2~.*_register_type.]) ''' rewrite=''' static const TypeInfo :[i1a][] = { { :[body1] }, { :[body2] }, }; DEFINE_TYPES(:[i1a]) ''' rule=''' where :[i1a] == :[i1b], :[i2a] == :[i2b], :[rt1] == :[rt2] ''' and re-indented manually. Signed-off-by: Philippe Mathieu-Daudé I checked the aspeed part. Reviewed-by: Cédric Le Goater Thanks, C. --- hw/sd/aspeed_sdhci.c | 19 --- hw/sd/bcm2835_sdhost.c | 33 ++--- hw/sd/cadence_sdhci.c | 21 + hw/sd/core.c | 19 --- hw/sd/npcm7xx_sdhci.c | 21 + hw/sd/pl181.c | 35 +++ hw/sd/pxa2xx_mmci.c| 35 +++ hw/sd/sd.c | 37 - hw/sd/sdhci-pci.c | 25 +++-- hw/sd/ssi-sd.c | 19 --- 10 files changed, 113 insertions(+), 151 deletions(-) diff --git a/hw/sd/aspeed_sdhci.c b/hw/sd/aspeed_sdhci.c index be8cafd65f..e53206d959 100644 --- a/hw/sd/aspeed_sdhci.c +++ b/hw/sd/aspeed_sdhci.c @@ -198,16 +198,13 @@ static void aspeed_sdhci_class_init(ObjectClass *classp, void *data) device_class_set_props(dc, aspeed_sdhci_properties); } -static const TypeInfo aspeed_sdhci_info = { -.name = TYPE_ASPEED_SDHCI, -.parent= TYPE_SYS_BUS_DEVICE, -.instance_size = sizeof(AspeedSDHCIState), -.class_init= aspeed_sdhci_class_init, +static const TypeInfo aspeed_sdhci_types[] = { +{ +.name = TYPE_ASPEED_SDHCI, +.parent = TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(AspeedSDHCIState), +.class_init = aspeed_sdhci_class_init, +}, }; -static void aspeed_sdhci_register_types(void) -{ -type_register_static(_sdhci_info); -} - -type_init(aspeed_sdhci_register_types) +DEFINE_TYPES(aspeed_sdhci_types) diff --git a/hw/sd/bcm2835_sdhost.c b/hw/sd/bcm2835_sdhost.c index 9431c35914..a600cf39e2 100644 --- a/hw/sd/bcm2835_sdhost.c +++ b/hw/sd/bcm2835_sdhost.c @@ -436,24 +436,19 @@ static void bcm2835_sdhost_class_init(ObjectClass *klass, void *data) dc->vmsd = _bcm2835_sdhost; } -static const TypeInfo bcm2835_sdhost_info = { -.name = TYPE_BCM2835_SDHOST, -.parent= TYPE_SYS_BUS_DEVICE, -.instance_size = sizeof(BCM2835SDHostState), -.class_init= bcm2835_sdhost_class_init, -.instance_init = bcm2835_sdhost_init, +static const TypeInfo bcm2835_sdhost_types[] = { +{ +.name = TYPE_BCM2835_SDHOST, +.parent = TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(BCM2835SDHostState), +.class_init = bcm2835_sdhost_class_init, +.instance_init = bcm2835_sdhost_init, +}, +{ +.name = TYPE_BCM2835_SDHOST_BUS, +.parent = TYPE_SD_BUS, +.instance_size = sizeof(SDBus), +}, }; -static const TypeInfo bcm2835_sdhost_bus_info = { -.name = TYPE_BCM2835_SDHOST_BUS, -.parent = TYPE_SD_BUS, -.instance_size = sizeof(SDBus), -}; - -static void bcm2835_sdhost_register_types(void) -{ -type_register_static(_sdhost_info); -type_register_static(_sdhost_bus_info); -} - -type_init(bcm2835_sdhost_register_types) +DEFINE_TYPES(bcm2835_sdhost_types) diff --git a/hw/sd/cadence_sdhci.c b/hw/sd/cadence_sdhci.c index 75db34befe..ef4e0d74e3 100644 --- a/hw/sd/cadence_sdhci.c +++ b/hw/sd/cadence_sdhci.c @@ -175,17 +175,14 @@ static void cadence_sdhci_class_init(ObjectClass *classp, void *data) dc->vmsd = _cadence_sdhci; } -static const TypeInfo cadence_sdhci_info = { -.name = TYPE_CADENCE_SDHCI, -.parent= TYPE_SYS_BUS_DEVICE, -.instance_size = sizeof(CadenceSDHCIState), -
Re: [PATCH 07/13] RFC migration: icp/server is a mess
On 10/20/23 07:10, Thomas Huth wrote: On 19/10/2023 23.15, Cédric Le Goater wrote: On 10/19/23 22:49, Greg Kurz wrote: Hi Juan, On Thu, 19 Oct 2023 21:08:25 +0200 Juan Quintela wrote: Current code does: - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance dependinfg on cpu number - for newer machines, it register vmstate_icp with "icp/server" name and instance 0 - now it unregisters "icp/server" for the 1st instance. Heh I remember about this hack... it was caused by some rework in the interrupt controller that broke migration. This is wrong at many levels: - we shouldn't have two VMSTATEDescriptions with the same name I don't know how bad it is. The idea here is to send extra state in the stream because older QEMU expect it (but won't use it), so it made sense to keep the same name. - In case this is the only solution that we can came with, it needs to be: * register pre_2_10_vmstate_dummy_icp * unregister pre_2_10_vmstate_dummy_icp * register real vmstate_icp As the initialization of this machine is already complex enough, I need help from PPC maintainers to fix this. What about dropping all this code, i.e. basically reverting 46f7afa37096 ("spapr: fix migration of ICPState objects from/to older QEMU") ? I'd vote for removing the dummy ICP states for pre-2.10 pseries machines. Migration compatibility would be broken for these old versions but, with a clear error report, it should be more than enough. I doubt anyone will need such a feature now days. In that case: Please also put the pseries-2.1 machine up to pseries-2.9 onto the deprecation list, so that they can finally get removed after two releases. It does not make sense to keep compat machines around if the compatibility is not available anymore. This would be a really good cleanup for PPC to deprecate pseries-2.1/2.9. We did a few workarounds in that time frame which wouldn't be necessary anymore. Thanks, C.
Re: [PATCH 07/13] RFC migration: icp/server is a mess
On 10/19/23 22:49, Greg Kurz wrote: Hi Juan, On Thu, 19 Oct 2023 21:08:25 +0200 Juan Quintela wrote: Current code does: - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance dependinfg on cpu number - for newer machines, it register vmstate_icp with "icp/server" name and instance 0 - now it unregisters "icp/server" for the 1st instance. Heh I remember about this hack... it was caused by some rework in the interrupt controller that broke migration. This is wrong at many levels: - we shouldn't have two VMSTATEDescriptions with the same name I don't know how bad it is. The idea here is to send extra state in the stream because older QEMU expect it (but won't use it), so it made sense to keep the same name. - In case this is the only solution that we can came with, it needs to be: * register pre_2_10_vmstate_dummy_icp * unregister pre_2_10_vmstate_dummy_icp * register real vmstate_icp As the initialization of this machine is already complex enough, I need help from PPC maintainers to fix this. What about dropping all this code, i.e. basically reverting 46f7afa37096 ("spapr: fix migration of ICPState objects from/to older QEMU") ? I'd vote for removing the dummy ICP states for pre-2.10 pseries machines. Migration compatibility would be broken for these old versions but, with a clear error report, it should be more than enough. I doubt anyone will need such a feature now days. C. Unless we still care to migrate pseries machine types from 2017 of course... Volunteers? Not working on PPC anymore since almost two years, I certainly don't have time, nor motivation to fix this. I might be able to answer some questions or to review someone else's patch that gets rid of the offending code, at best. Cheers, -- Greg CC: Cedric Le Goater CC: Daniel Henrique Barboza CC: David Gibson CC: Greg Kurz Signed-off-by: Juan Quintela --- hw/ppc/spapr.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index cb840676d3..8531d13492 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -143,7 +143,12 @@ static bool pre_2_10_vmstate_dummy_icp_needed(void *opaque) } static const VMStateDescription pre_2_10_vmstate_dummy_icp = { -.name = "icp/server", +/* + * Hack ahead. We can't have two devices with the same name and + * instance id. So I rename this to pass make check. + * Real help from people who knows the hardware is needed. + */ +.name = "pre-2.10-icp/server", .version_id = 1, .minimum_version_id = 1, .needed = pre_2_10_vmstate_dummy_icp_needed,
Re: [RFC PATCH 22/78] target/ppc: add fallthrough pseudo-keyword
On 10/13/23 09:47, Emmanouil Pitsidianakis wrote: In preparation of raising -Wimplicit-fallthrough to 5, replace all fall-through comments with the fallthrough attribute pseudo-keyword. Signed-off-by: Emmanouil Pitsidianakis --- target/ppc/cpu_init.c| 8 target/ppc/excp_helper.c | 6 +++--- target/ppc/mmu-radix64.c | 6 +++--- target/ppc/mmu_common.c | 12 ++-- target/ppc/translate.c | 6 +++--- 5 files changed, 19 insertions(+), 19 deletions(-) Reviewed-by: Cédric Le Goater Thanks, C.
Re: [RFC PATCH 42/75] hw/misc: add fallthrough pseudo-keyword
On 10/13/23 09:47, Emmanouil Pitsidianakis wrote: Signed-off-by: Emmanouil Pitsidianakis --- hw/misc/a9scu.c| 2 ++ hw/misc/aspeed_scu.c | 2 +- hw/misc/bcm2835_property.c | 12 ++-- hw/misc/mos6522.c | 4 ++-- 4 files changed, 11 insertions(+), 9 deletions(-) For aspeed, Reviewed-by: Cédric Le Goater Thanks, C. diff --git a/hw/misc/a9scu.c b/hw/misc/a9scu.c index a375ebc987..b422bec3c4 100644 --- a/hw/misc/a9scu.c +++ b/hw/misc/a9scu.c @@ -21,26 +21,27 @@ static uint64_t a9_scu_read(void *opaque, hwaddr offset, unsigned size) { A9SCUState *s = (A9SCUState *)opaque; switch (offset) { case 0x00: /* Control */ return s->control; case 0x04: /* Configuration */ return (((1 << s->num_cpu) - 1) << 4) | (s->num_cpu - 1); case 0x08: /* CPU Power Status */ return s->status; case 0x0c: /* Invalidate All Registers In Secure State */ return 0; case 0x40: /* Filtering Start Address Register */ case 0x44: /* Filtering End Address Register */ /* RAZ/WI, like an implementation with only one AXI master */ return 0; case 0x50: /* SCU Access Control Register */ case 0x54: /* SCU Non-secure Access Control Register */ /* unimplemented, fall through */ +fallthrough; default: qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx"\n", __func__, offset); return 0; } } @@ -48,31 +49,32 @@ static uint64_t a9_scu_read(void *opaque, hwaddr offset, static void a9_scu_write(void *opaque, hwaddr offset, uint64_t value, unsigned size) { A9SCUState *s = (A9SCUState *)opaque; switch (offset) { case 0x00: /* Control */ s->control = value & 1; break; case 0x4: /* Configuration: RO */ break; case 0x08: case 0x09: case 0x0A: case 0x0B: /* Power Control */ s->status = value; break; case 0x0c: /* Invalidate All Registers In Secure State */ /* no-op as we do not implement caches */ break; case 0x40: /* Filtering Start Address Register */ case 0x44: /* Filtering End Address Register */ /* RAZ/WI, like an implementation with only one AXI master */ break; case 0x50: /* SCU Access Control Register */ case 0x54: /* SCU Non-secure Access Control Register */ /* unimplemented, fall through */ +fallthrough; default: qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx " value 0x%"PRIx64"\n", __func__, offset, value); break; } } diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c index 8335364906..4a1ea2fa21 100644 --- a/hw/misc/aspeed_scu.c +++ b/hw/misc/aspeed_scu.c @@ -645,65 +645,65 @@ static uint64_t aspeed_ast2600_scu_read(void *opaque, hwaddr offset, static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset, uint64_t data64, unsigned size) { AspeedSCUState *s = ASPEED_SCU(opaque); int reg = TO_REG(offset); /* Truncate here so bitwise operations below behave as expected */ uint32_t data = data64; if (reg >= ASPEED_AST2600_SCU_NR_REGS) { qemu_log_mask(LOG_GUEST_ERROR, "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx "\n", __func__, offset); return; } if (reg > PROT_KEY && !s->regs[PROT_KEY]) { qemu_log_mask(LOG_GUEST_ERROR, "%s: SCU is locked!\n", __func__); } trace_aspeed_scu_write(offset, size, data); switch (reg) { case AST2600_PROT_KEY: s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0; return; case AST2600_HW_STRAP1: case AST2600_HW_STRAP2: if (s->regs[reg + 2]) { return; } -/* fall through */ +fallthrough; case AST2600_SYS_RST_CTRL: case AST2600_SYS_RST_CTRL2: case AST2600_CLK_STOP_CTRL: case AST2600_CLK_STOP_CTRL2: /* W1S (Write 1 to set) registers */ s->regs[reg] |= data; return; case AST2600_SYS_RST_CTRL_CLR: case AST2600_SYS_RST_CTRL2_CLR: case AST2600_CLK_STOP_CTRL_CLR: case AST2600_CLK_STOP_CTRL2_CLR: case AST2600_HW_STRAP1_CLR: case AST2600_HW_STRAP2_CLR: /* * W1C (Write 1 to clear) registers are offset by one address from * the data register */ s->regs[reg - 1] &= ~data; return; case AST2600_RNG_DATA: case AST260
Re: [RFC PATCH 13/78] hw/adc: add fallthrough pseudo-keyword
On 10/13/23 09:47, Emmanouil Pitsidianakis wrote: In preparation of raising -Wimplicit-fallthrough to 5, replace all fall-through comments with the fallthrough attribute pseudo-keyword. Signed-off-by: Emmanouil Pitsidianakis --- hw/adc/aspeed_adc.c | 12 ++-- hw/adc/zynq-xadc.c | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) For aspeed, Reviewed-by: Cédric Le Goater Thanks, C.
Re: [PATCH v7 05/15] python/qemu: rename command() to cmd()
On 10/6/23 17:41, Vladimir Sementsov-Ogievskiy wrote: Use a shorter name. We are going to move in iotests from qmp() to command() where possible. But command() is longer than qmp() and don't look better. Let's rename. You can simply grep for '\.command(' and for 'def command(' to check that everything is updated (command() in tests/docker/docker.py is unrelated). Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Daniel P. Berrangé Reviewed-by: Eric Blake [vsementsov: also update three occurrences in tests/avocado/machine_aspeed.py and keep r-b] For aspeed, Reviewed-by: Cédric Le Goater Thanks, C. --- docs/devel/testing.rst| 10 +- python/qemu/machine/machine.py| 8 +- python/qemu/qmp/legacy.py | 2 +- python/qemu/qmp/qmp_shell.py | 2 +- python/qemu/utils/qemu_ga_client.py | 2 +- python/qemu/utils/qom.py | 8 +- python/qemu/utils/qom_common.py | 2 +- python/qemu/utils/qom_fuse.py | 6 +- scripts/cpu-x86-uarch-abi.py | 8 +- scripts/device-crash-test | 8 +- scripts/render_block_graph.py | 8 +- tests/avocado/avocado_qemu/__init__.py| 4 +- tests/avocado/cpu_queries.py | 5 +- tests/avocado/hotplug_cpu.py | 10 +- tests/avocado/info_usernet.py | 4 +- tests/avocado/machine_arm_integratorcp.py | 6 +- tests/avocado/machine_aspeed.py | 12 +- tests/avocado/machine_m68k_nextcube.py| 4 +- tests/avocado/machine_mips_malta.py | 6 +- tests/avocado/machine_s390_ccw_virtio.py | 28 ++-- tests/avocado/migration.py| 10 +- tests/avocado/pc_cpu_hotplug_props.py | 2 +- tests/avocado/version.py | 4 +- tests/avocado/virtio_check_params.py | 6 +- tests/avocado/virtio_version.py | 5 +- tests/avocado/x86_cpu_model_versions.py | 13 +- tests/migration/guestperf/engine.py | 150 +++--- tests/qemu-iotests/256| 34 ++--- tests/qemu-iotests/257| 36 +++--- 29 files changed, 204 insertions(+), 199 deletions(-) diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst index 5d1fc0aa95..21525e9aae 100644 --- a/docs/devel/testing.rst +++ b/docs/devel/testing.rst @@ -1014,8 +1014,8 @@ class. Here's a simple usage example: """ def test_qmp_human_info_version(self): self.vm.launch() - res = self.vm.command('human-monitor-command', -command_line='info version') + res = self.vm.cmd('human-monitor-command', +command_line='info version') self.assertRegexpMatches(res, r'^(\d+\.\d+\.\d)') To execute your test, run: @@ -1065,15 +1065,15 @@ and hypothetical example follows: first_machine.launch() second_machine.launch() - first_res = first_machine.command( + first_res = first_machine.cmd( 'human-monitor-command', command_line='info version') - second_res = second_machine.command( + second_res = second_machine.cmd( 'human-monitor-command', command_line='info version') - third_res = self.get_vm(name='third_machine').command( + third_res = self.get_vm(name='third_machine').cmd( 'human-monitor-command', command_line='info version') diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py index dd1a79cb37..c4e80544bd 100644 --- a/python/qemu/machine/machine.py +++ b/python/qemu/machine/machine.py @@ -697,16 +697,16 @@ def qmp(self, cmd: str, self._quit_issued = True return ret -def command(self, cmd: str, -conv_keys: bool = True, -**args: Any) -> QMPReturnValue: +def cmd(self, cmd: str, +conv_keys: bool = True, +**args: Any) -> QMPReturnValue: """ Invoke a QMP command. On success return the response dict. On failure raise an exception. """ qmp_args = self._qmp_args(conv_keys, args) -ret = self._qmp.command(cmd, **qmp_args) +ret = self._qmp.cmd(cmd, **qmp_args) if cmd == 'quit': self._quit_issued = True return ret diff --git a/python/qemu/qmp/legacy.py b/python/qemu/qmp/legacy.py index e5fa1ce9c4..22a2b5616e 100644 --- a/python/qemu/qmp/legacy.py +++ b/python/qemu/qmp/legacy.py @@ -207,7 +207,7 @@ def cmd_raw(self, name: str, qmp_cmd['arguments'] = args return self.cmd_obj(qmp_cmd) -def command(self, cmd: str, **kwds: object) -> QMPReturnValue: +def cmd(self, cmd: str, **kwds: object) -> QMP
Re: [PATCH v2 01/22] qapi: Inline and remove QERR_BUS_NO_HOTPLUG definition
On 10/5/23 06:50, Philippe Mathieu-Daudé wrote: Address the comment added in commit 4629ed1e98 ("qerror: Finally unused, clean up"), from 2015: /* * These macros will go away, please don't use * in new code, and do not add new ones! */ Mechanical transformation using sed, manually removing the definition in include/qapi/qmp/qerror.h. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Thanks, C. --- include/qapi/qmp/qerror.h | 3 --- hw/ppc/spapr_pci.c| 4 ++-- softmmu/qdev-monitor.c| 8 +--- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 8dd9fcb071..1a9c2d3502 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -17,9 +17,6 @@ * add new ones! */ -#define QERR_BUS_NO_HOTPLUG \ -"Bus '%s' does not support hotplugging" - #define QERR_DEVICE_HAS_NO_MEDIUM \ "Device '%s' has no medium" diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 370c5a90f2..7f063f5852 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1550,7 +1550,7 @@ static void spapr_pci_pre_plug(HotplugHandler *plug_handler, * we need to let them know it's not enabled */ if (plugged_dev->hotplugged) { -error_setg(errp, QERR_BUS_NO_HOTPLUG, +error_setg(errp, "Bus '%s' does not support hotplugging", object_get_typename(OBJECT(phb))); return; } @@ -1671,7 +1671,7 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler, SpaprDrc *drc = drc_from_dev(phb, pdev); if (!phb->dr_enabled) { -error_setg(errp, QERR_BUS_NO_HOTPLUG, +error_setg(errp, "Bus '%s' does not support hotplugging", object_get_typename(OBJECT(phb))); return; } diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c index 74f4e41338..3a9740dcbd 100644 --- a/softmmu/qdev-monitor.c +++ b/softmmu/qdev-monitor.c @@ -656,7 +656,8 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, if (qdev_should_hide_device(opts, from_json, errp)) { if (bus && !qbus_is_hotpluggable(bus)) { -error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name); +error_setg(errp, "Bus '%s' does not support hotplugging", + bus->name); } return NULL; } else if (*errp) { @@ -664,7 +665,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, } if (phase_check(PHASE_MACHINE_READY) && bus && !qbus_is_hotpluggable(bus)) { -error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name); +error_setg(errp, "Bus '%s' does not support hotplugging", bus->name); return NULL; } @@ -904,7 +905,8 @@ void qdev_unplug(DeviceState *dev, Error **errp) } if (dev->parent_bus && !qbus_is_hotpluggable(dev->parent_bus)) { -error_setg(errp, QERR_BUS_NO_HOTPLUG, dev->parent_bus->name); +error_setg(errp, "Bus '%s' does not support hotplugging", + dev->parent_bus->name); return; }
Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic
On 9/1/23 16:50, Markus Armbruster wrote: Cédric Le Goater writes: On 8/31/23 16:30, Eric Blake wrote: On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote: [This paragraph written last: Bear with my stream of consciousness review below, where I end up duplicating some of the conslusions you reached before the point where I saw where the patch was headed] Variables declared in macros can shadow other variables. Much of the time, this is harmless, e.g.: #define _FDT(exp) \ do { \ int ret = (exp); \ if (ret < 0) { \ error_report("error creating device tree: %s: %s", \ #exp, fdt_strerror(ret)); \ exit(1); \ } \ } while (0) Which is why I've seen some projects require a strict namespace separation: if all macro parameters and any identifiers declared in macros use either a leading or a trailing _ (I prefer a trailing one, to avoid risking conflicts with libc reserved namespace; but leading is usually okay), and all other identifiers avoid that namespace, then you will never have shadowing by calling a macro from normal code. I started fixing the _FDT() macro since it is quite noisy at compile. Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_' and 'i_' ? I used a 'local_' prefix for now but I can change. I believe identifiers with a '_' suffix are just fine in macros. We have quite a few of them already. ok I also have a bunch of fixes for ppc. Appreciated! count me on for the ppc files : hw/ppc/pnv_psi.c hw/ppc/spapr.c hw/ppc/spapr_drc.c hw/ppc/spapr_pci.c include/hw/ppc/fdt.h and surely some other files under target/ppc/ This one was taken care of by Phil: include/sysemu/device_tree.h C.
Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic
On 8/31/23 16:30, Eric Blake wrote: On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote: [This paragraph written last: Bear with my stream of consciousness review below, where I end up duplicating some of the conslusions you reached before the point where I saw where the patch was headed] Variables declared in macros can shadow other variables. Much of the time, this is harmless, e.g.: #define _FDT(exp) \ do { \ int ret = (exp); \ if (ret < 0) { \ error_report("error creating device tree: %s: %s", \ #exp, fdt_strerror(ret)); \ exit(1); \ } \ } while (0) Which is why I've seen some projects require a strict namespace separation: if all macro parameters and any identifiers declared in macros use either a leading or a trailing _ (I prefer a trailing one, to avoid risking conflicts with libc reserved namespace; but leading is usually okay), and all other identifiers avoid that namespace, then you will never have shadowing by calling a macro from normal code. I started fixing the _FDT() macro since it is quite noisy at compile. Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_' and 'i_' ? I used a 'local_' prefix for now but I can change. I also have a bunch of fixes for ppc. C. Harmless shadowing in h_client_architecture_support(): target_ulong ret; [...] ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize); if (ret == H_SUCCESS) { _FDT((fdt_pack(spapr->fdt_blob))); [...] } return ret; However, we can get in trouble when the shadowed variable is used in a macro argument: #define QOBJECT(obj) ({ \ typeof(obj) o = (obj); \ o ? container_of(&(o)->base, QObject, base) : NULL; \ }) QOBJECT(o) expands into ({ --->typeof(o) o = (o); o ? container_of(&(o)->base, QObject, base) : NULL; }) Unintended variable name capture at --->. We'd be saved by -Winit-self. But I could certainly construct more elaborate death traps that don't trigger it. Indeed, not fully understanding the preprocessor makes for some interesting death traps. To reduce the risk of trapping ourselves, we use variable names in macros that no sane person would use elsewhere. Here's our actual definition of QOBJECT(): #define QOBJECT(obj) ({ \ typeof(obj) _obj = (obj); \ _obj ? container_of(&(_obj)->base, QObject, base) : NULL; \ }) Yep, goes back to the policy I've seen of enforcing naming conventions that ensure macros can't shadow normal code. Works well enough until we nest macro calls. For instance, with #define qobject_ref(obj) ({ \ typeof(obj) _obj = (obj); \ qobject_ref_impl(QOBJECT(_obj));\ _obj; \ }) the expression qobject_ref(obj) expands into ({ typeof(obj) _obj = (obj); qobject_ref_impl( ({ --->typeof(_obj) _obj = (_obj); _obj ? container_of(&(_obj)->base, QObject, base) : NULL; })); _obj; }) Unintended variable name capture at --->. Yep, you've just proven how macros calling macros requires care, as we no longer have the namespace protections. If macro calls can only form a DAG, it is possible to choose unique intermediate declarations for every macro (although remembering to do that, and still keeping the macro definition legible, may not be easy). But if you can have macros that form any sort of nesting loop (and we WANT to allow things like MIN(MIN(a, b), c) - which is such a loop), dynamic naming becomes the only solution. The only reliable way to prevent unintended variable name capture is -Wshadow. Yes, I would love to have that enabled eventually. One blocker for enabling it is shadowing hiding in function-like macros like qdict_put(dict, "name", qobject_ref(...)) qdict_put() wraps its last argument in QOBJECT(), and the last argument here contains another QOBJECT(). Use dark preprocessor sorcery to make the macros that give us this problem use different variable names on every call. Sounds foreboding; hopefully not many macros are affected... Signed-off-by: Markus Armbruster --- include/qapi/qmp/qobject.h | 8 +--- include/qemu/atomic.h | 11 ++-
Re: [PATCH 07/11] hw/arm/aspeed: Clean up local variable shadowing
On 9/1/23 00:56, Philippe Mathieu-Daudé wrote: Fix: hw/arm/aspeed_ast2600.c:391:18: error: declaration shadows a local variable [-Werror,-Wshadow] qemu_irq irq = aspeed_soc_get_irq(s, ASPEED_DEV_TIMER1 + i); ^ hw/arm/aspeed_ast2600.c:283:14: note: previous declaration is here qemu_irq irq; ^ hw/arm/aspeed_ast2600.c:416:18: error: declaration shadows a local variable [-Werror,-Wshadow] qemu_irq irq = qdev_get_gpio_in(DEVICE(>a7mpcore), ^ hw/arm/aspeed_ast2600.c:283:14: note: previous declaration is here qemu_irq irq; ^ Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Thanks, C. --- hw/arm/aspeed_ast2600.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c index a8b3a8065a..f54063445d 100644 --- a/hw/arm/aspeed_ast2600.c +++ b/hw/arm/aspeed_ast2600.c @@ -280,7 +280,6 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) AspeedSoCState *s = ASPEED_SOC(dev); AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s); Error *err = NULL; -qemu_irq irq; g_autofree char *sram_name = NULL; /* Default boot region (SPI memory or ROMs) */ @@ -339,6 +338,7 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) for (i = 0; i < sc->num_cpus; i++) { SysBusDevice *sbd = SYS_BUS_DEVICE(>a7mpcore); DeviceState *d = DEVICE(>cpu[i]); +qemu_irq irq; irq = qdev_get_gpio_in(d, ARM_CPU_IRQ); sysbus_connect_irq(sbd, i, irq);
Re: [PATCH] hw/nvme: use stl/ldl pci dma api
On 7/20/23 11:42, Klaus Jensen wrote: From: Klaus Jensen Use the stl/ldl pci dma api for writing/reading doorbells. This removes the explicit endian conversions. Signed-off-by: Klaus Jensen Reviewed-by: Cédric Le Goater Tested-by: Cédric Le Goater on big and little endian hosts, Thanks, C. --- hw/nvme/ctrl.c | 42 +- 1 file changed, 13 insertions(+), 29 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index dadc2dc7da10..f2e5a2fa737b 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -1468,20 +1468,16 @@ static inline void nvme_blk_write(BlockBackend *blk, int64_t offset, static void nvme_update_cq_eventidx(const NvmeCQueue *cq) { -uint32_t v = cpu_to_le32(cq->head); - trace_pci_nvme_update_cq_eventidx(cq->cqid, cq->head); -pci_dma_write(PCI_DEVICE(cq->ctrl), cq->ei_addr, , sizeof(v)); +stl_le_pci_dma(PCI_DEVICE(cq->ctrl), cq->ei_addr, cq->head, + MEMTXATTRS_UNSPECIFIED); } static void nvme_update_cq_head(NvmeCQueue *cq) { -uint32_t v; - -pci_dma_read(PCI_DEVICE(cq->ctrl), cq->db_addr, , sizeof(v)); - -cq->head = le32_to_cpu(v); +ldl_le_pci_dma(PCI_DEVICE(cq->ctrl), cq->db_addr, >head, + MEMTXATTRS_UNSPECIFIED); trace_pci_nvme_update_cq_head(cq->cqid, cq->head); } @@ -6801,7 +6797,6 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) PCIDevice *pci = PCI_DEVICE(n); uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1); uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2); -uint32_t v; int i; /* Address should be page aligned */ @@ -6819,8 +6814,6 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) NvmeCQueue *cq = n->cq[i]; if (sq) { -v = cpu_to_le32(sq->tail); - /* * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3) * nvme_process_db() uses this hard-coded way to calculate @@ -6828,7 +6821,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) */ sq->db_addr = dbs_addr + (i << 3); sq->ei_addr = eis_addr + (i << 3); -pci_dma_write(pci, sq->db_addr, , sizeof(sq->tail)); +stl_le_pci_dma(pci, sq->db_addr, sq->tail, MEMTXATTRS_UNSPECIFIED); if (n->params.ioeventfd && sq->sqid != 0) { if (!nvme_init_sq_ioeventfd(sq)) { @@ -6838,12 +6831,10 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) } if (cq) { -v = cpu_to_le32(cq->head); - /* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */ cq->db_addr = dbs_addr + (i << 3) + (1 << 2); cq->ei_addr = eis_addr + (i << 3) + (1 << 2); -pci_dma_write(pci, cq->db_addr, , sizeof(cq->head)); +stl_le_pci_dma(pci, cq->db_addr, cq->head, MEMTXATTRS_UNSPECIFIED); if (n->params.ioeventfd && cq->cqid != 0) { if (!nvme_init_cq_ioeventfd(cq)) { @@ -6974,20 +6965,16 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) static void nvme_update_sq_eventidx(const NvmeSQueue *sq) { -uint32_t v = cpu_to_le32(sq->tail); - trace_pci_nvme_update_sq_eventidx(sq->sqid, sq->tail); -pci_dma_write(PCI_DEVICE(sq->ctrl), sq->ei_addr, , sizeof(v)); +stl_le_pci_dma(PCI_DEVICE(sq->ctrl), sq->ei_addr, sq->tail, + MEMTXATTRS_UNSPECIFIED); } static void nvme_update_sq_tail(NvmeSQueue *sq) { -uint32_t v; - -pci_dma_read(PCI_DEVICE(sq->ctrl), sq->db_addr, , sizeof(v)); - -sq->tail = le32_to_cpu(v); +ldl_le_pci_dma(PCI_DEVICE(sq->ctrl), sq->db_addr, >tail, + MEMTXATTRS_UNSPECIFIED); trace_pci_nvme_update_sq_tail(sq->sqid, sq->tail); } @@ -7592,7 +7579,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size) static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) { PCIDevice *pci = PCI_DEVICE(n); -uint32_t qid, v; +uint32_t qid; if (unlikely(addr & ((1 << 2) - 1))) { NVME_GUEST_ERR(pci_nvme_ub_db_wr_misaligned, @@ -7659,8 +7646,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) start_sqs = nvme_cq_full(cq) ? 1 : 0; cq->head = new_head; if (!qid && n->dbbuf_enabled) { -v = cpu_to_le32(cq->head); -pci_dma_write(pci, cq->db_addr, , sizeof(cq->head)); +stl_le_pci_dma(pci, cq->db_addr, cq->head, MEMTXATTRS_UNSPECIFIED);
Re: [PATCH] hw/nvme: fix endianness issue for shadow doorbells
On 7/18/23 12:35, Klaus Jensen wrote: From: Klaus Jensen In commit 2fda0726e514 ("hw/nvme: fix missing endian conversions for doorbell buffers"), we fixed shadow doorbells for big-endian guests running on little endian hosts. But I did not fix little-endian guests on big-endian hosts. Fix this. Solves issue #1765. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765 Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-sta...@nongnu.org Reported-by: Thomas Huth Signed-off-by: Klaus Jensen Tested-by: Cédric Le Goater with a PowerNV QEMU machine running on a s390x LPAR. Thanks, C. --- hw/nvme/ctrl.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 8e8e870b9a80..dadc2dc7da10 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6801,6 +6801,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) PCIDevice *pci = PCI_DEVICE(n); uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1); uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2); +uint32_t v; int i; /* Address should be page aligned */ @@ -6818,6 +6819,8 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) NvmeCQueue *cq = n->cq[i]; if (sq) { +v = cpu_to_le32(sq->tail); + /* * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3) * nvme_process_db() uses this hard-coded way to calculate @@ -6825,7 +6828,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) */ sq->db_addr = dbs_addr + (i << 3); sq->ei_addr = eis_addr + (i << 3); -pci_dma_write(pci, sq->db_addr, >tail, sizeof(sq->tail)); +pci_dma_write(pci, sq->db_addr, , sizeof(sq->tail)); if (n->params.ioeventfd && sq->sqid != 0) { if (!nvme_init_sq_ioeventfd(sq)) { @@ -6835,10 +6838,12 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) } if (cq) { +v = cpu_to_le32(cq->head); + /* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */ cq->db_addr = dbs_addr + (i << 3) + (1 << 2); cq->ei_addr = eis_addr + (i << 3) + (1 << 2); -pci_dma_write(pci, cq->db_addr, >head, sizeof(cq->head)); +pci_dma_write(pci, cq->db_addr, , sizeof(cq->head)); if (n->params.ioeventfd && cq->cqid != 0) { if (!nvme_init_cq_ioeventfd(cq)) { @@ -7587,7 +7592,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size) static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) { PCIDevice *pci = PCI_DEVICE(n); -uint32_t qid; +uint32_t qid, v; if (unlikely(addr & ((1 << 2) - 1))) { NVME_GUEST_ERR(pci_nvme_ub_db_wr_misaligned, @@ -7654,7 +7659,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) start_sqs = nvme_cq_full(cq) ? 1 : 0; cq->head = new_head; if (!qid && n->dbbuf_enabled) { -pci_dma_write(pci, cq->db_addr, >head, sizeof(cq->head)); +v = cpu_to_le32(cq->head); +pci_dma_write(pci, cq->db_addr, , sizeof(cq->head)); } if (start_sqs) { NvmeSQueue *sq; @@ -7714,6 +7720,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) sq->tail = new_tail; if (!qid && n->dbbuf_enabled) { +v = cpu_to_le32(sq->tail); + /* * The spec states "the host shall also update the controller's * corresponding doorbell property to match the value of that entry @@ -7727,7 +7735,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) * including ones that run on Linux, are not updating Admin Queues, * so we can't trust reading it for an appropriate sq tail. */ -pci_dma_write(pci, sq->db_addr, >tail, sizeof(sq->tail)); +pci_dma_write(pci, sq->db_addr, , sizeof(sq->tail)); } qemu_bh_schedule(sq->bh);
Re: [PATCH v2 01/16] migration: Don't use INT64_MAX for unlimited rate
On 5/16/23 11:24, Juan Quintela wrote: David Edmondson wrote: Juan Quintela writes: Define and use RATE_LIMIT_MAX instead. Suggest "RATE_LIMIT_MAX_NONE". Then even better RATE_LIMIT_DISABLED? I'd vote for RATE_LIMIT_DISABLED. RATE_LIMIT_NONE? Using MAX and NONE at the same time looks strange. Cheers, C.
Re: [PATCH v2 05/16] migration: Move rate_limit_max and rate_limit_used to migration_stats
On 5/15/23 21:56, Juan Quintela wrote: These way we can make them atomic and use this functions from any place. I also moved all functions that use rate_limit to migration-stats. Functions got renamed, they are not qemu_file anymore. qemu_file_rate_limit -> migration_rate_exceeded qemu_file_set_rate_limit -> migration_rate_set qemu_file_get_rate_limit -> migration_rate_get qemu_file_reset_rate_limit -> migration_rate_reset qemu_file_acct_rate_limit -> migration_rate_account. Signed-off-by: Juan Quintela Reviewed-by: Harsh Prateek Bora Reviewed-by: Cédric Le Goater Thanks, C. --- s/this/these/ (harsh) If you have any good suggestion for better names, I am all ears. Fix missing / XFER_LIMIT_RATIO in migration_rate_set(quintela) --- include/migration/qemu-file-types.h | 12 ++- migration/migration-stats.h | 47 ++ migration/options.h | 7 migration/qemu-file.h | 11 -- hw/ppc/spapr.c | 4 +-- hw/s390x/s390-stattrib.c| 2 +- migration/block-dirty-bitmap.c | 2 +- migration/block.c | 5 +-- migration/migration-stats.c | 44 migration/migration.c | 14 migration/multifd.c | 2 +- migration/options.c | 7 ++-- migration/qemu-file.c | 52 ++--- migration/ram.c | 2 +- migration/savevm.c | 2 +- 15 files changed, 124 insertions(+), 89 deletions(-) diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h index 1436f9ce92..9ba163f333 100644 --- a/include/migration/qemu-file-types.h +++ b/include/migration/qemu-file-types.h @@ -165,6 +165,16 @@ size_t coroutine_mixed_fn qemu_get_counted_string(QEMUFile *f, char buf[256]); void qemu_put_counted_string(QEMUFile *f, const char *name); -int qemu_file_rate_limit(QEMUFile *f); +/** + * migration_rate_exceeded: Check if we have exceeded rate for this interval + * + * Checks if we have already transferred more data that we are allowed + * in the current interval. + * + * @f: QEMUFile used for main migration channel + * + * Returns if we should stop sending data for this interval. + */ +bool migration_rate_exceeded(QEMUFile *f); #endif diff --git a/migration/migration-stats.h b/migration/migration-stats.h index 21402af9e4..e39c083245 100644 --- a/migration/migration-stats.h +++ b/migration/migration-stats.h @@ -15,6 +15,12 @@ #include "qemu/stats64.h" +/* + * Amount of time to allocate to each "chunk" of bandwidth-throttled + * data. + */ +#define BUFFER_DELAY 100 + /* * If rate_limit_max is 0, there is special code to remove the rate * limit. @@ -75,6 +81,14 @@ typedef struct { * Number of bytes sent during precopy stage. */ Stat64 precopy_bytes; +/* + * Maximum amount of data we can send in a cycle. + */ +Stat64 rate_limit_max; +/* + * Amount of data we have sent in the current cycle. + */ +Stat64 rate_limit_used; /* * How long has the setup stage took. */ @@ -100,4 +114,37 @@ extern MigrationAtomicStats mig_stats; * Returns: Nothing. The time is stored in val. */ void migration_time_since(MigrationAtomicStats *stats, int64_t since); + +/** + * migration_rate_account: Increase the number of bytes transferred. + * + * Report on a number of bytes the have been transferred that need to + * be applied to the rate limiting calcuations. + * + * @len: amount of bytes transferred + */ +void migration_rate_account(uint64_t len); + +/** + * migration_rate_get: Get the maximum amount that can be transferred. + * + * Returns the maximum number of bytes that can be transferred in a cycle. + */ +uint64_t migration_rate_get(void); + +/** + * migration_rate_reset: Reset the rate limit counter. + * + * This is called when we know we start a new transfer cycle. + */ +void migration_rate_reset(void); + +/** + * migration_rate_set: Set the maximum amount that can be transferred. + * + * Sets the maximum amount of bytes that can be transferred in one cycle. + * + * @new_rate: new maximum amount + */ +void migration_rate_set(uint64_t new_rate); #endif diff --git a/migration/options.h b/migration/options.h index 5cca3326d6..45991af3c2 100644 --- a/migration/options.h +++ b/migration/options.h @@ -17,13 +17,6 @@ #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" -/* constants */ - -/* Amount of time to allocate to each "chunk" of bandwidth-throttled - * data. */ -#define BUFFER_DELAY 100 -#define XFER_LIMIT_RATIO (1000 / BUFFER_DELAY) - /* migration properties */ extern Property migration_properties[]; diff --git a/migration/qemu-file.h b/migration/qemu-file.h index bcc39081f2..e649718492 100644 --- a/migration/
Re: [PATCH 10/21] migration: Move rate_limit_max and rate_limit_used to migration_stats
On 5/15/23 15:09, Juan Quintela wrote: Cédric Le Goater wrote: On 5/8/23 15:08, Juan Quintela wrote: This way we can make them atomic and use this functions from any place. I also moved all functions that use rate_limit to migration-stats. Functions got renamed, they are not qemu_file anymore. qemu_file_rate_limit -> migration_rate_limit_exceeded qemu_file_set_rate_limit -> migration_rate_limit_set qemu_file_get_rate_limit -> migration_rate_limit_get qemu_file_reset_rate_limit -> migration_rate_limit_reset qemu_file_acct_rate_limit -> migration_rate_limit_account. Signed-off-by: Juan Quintela --- If you have any good suggestion for better names, I am all ears. May be : qemu_file_rate_limit -> migration_rate_limit_is_exceeded I try not to put _is_ in function names. If it needs to be there, I think that I need to rename the functino. It is common practice for functions doing a simple test and returning a bool. No big deal anyway. migration_rate_limit_exceeded() seems clear to me. qemu_file_acct_rate_limit -> migration_rate_limit_inc My problem for this one is that we are not increasing the rate_limit, we are "decreasing" the amount of data we have for this period. That is why I thought about _account(), but who knows. Also, migration_rate_limit() would need some prefix to understand what is its purpose. What do you mean here? I am referring to : /* Returns true if the rate limiting was broken by an urgent request */ bool migration_rate_limit(void) { ... return urgent; } which existed prior to the name changes and I thought migration_rate_limit() would suffer the same fate. May be keep the '_limit' suffix for this one if you remove it for the others ? Thanks, C. This is the only rate_limit that I can think in migration. Do we really need "_limit" in the names ? You have a point here. If nobody complains/suggest anything else, I will drop the _limit for the next submission. Thanks very much.
Re: [PATCH 14/21] migration: We don't need the field rate_limit_used anymore
On 5/8/23 15:09, Juan Quintela wrote: Since previous commit, we calculate how much data we have send with migration_transferred_bytes() so no need to maintain this counter and remember to always update it. Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater Thanks, C. --- migration/migration-stats.c | 6 -- migration/migration-stats.h | 14 -- migration/multifd.c | 1 - migration/qemu-file.c | 4 4 files changed, 25 deletions(-) diff --git a/migration/migration-stats.c b/migration/migration-stats.c index eb1a2c1ad4..a42b5d953e 100644 --- a/migration/migration-stats.c +++ b/migration/migration-stats.c @@ -59,15 +59,9 @@ void migration_rate_limit_set(uint64_t limit) void migration_rate_limit_reset(QEMUFile *f) { -stat64_set(_stats.rate_limit_used, 0); stat64_set(_stats.rate_limit_start, migration_transferred_bytes(f)); } -void migration_rate_limit_account(uint64_t len) -{ -stat64_add(_stats.rate_limit_used, len); -} - uint64_t migration_transferred_bytes(QEMUFile *f) { uint64_t multifd = stat64_get(_stats.multifd_bytes); diff --git a/migration/migration-stats.h b/migration/migration-stats.h index 4029f1deab..ab4cc15a74 100644 --- a/migration/migration-stats.h +++ b/migration/migration-stats.h @@ -77,10 +77,6 @@ typedef struct { * Maximum amount of data we can send in a cycle. */ Stat64 rate_limit_max; -/* - * Amount of data we have sent in the current cycle. - */ -Stat64 rate_limit_used; /* * How long has the setup stage took. */ @@ -108,16 +104,6 @@ extern MigrationAtomicStats mig_stats; void calculate_time_since(Stat64 *val, int64_t since); -/** - * migration_rate_limit_account: Increase the number of bytes transferred. - * - * Report on a number of bytes the have been transferred that need to - * be applied to the rate limiting calcuations. - * - * @len: amount of bytes transferred - */ -void migration_rate_limit_account(uint64_t len); - /** * migration_rate_limit_get: Get the maximum amount that can be transferred. * diff --git a/migration/multifd.c b/migration/multifd.c index 2efb313be4..9d2ade7abc 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -432,7 +432,6 @@ static int multifd_send_pages(QEMUFile *f) multifd_send_state->pages = p->pages; p->pages = pages; transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len; -migration_rate_limit_account(transferred); qemu_mutex_unlock(>mutex); stat64_add(_stats.transferred, transferred); stat64_add(_stats.multifd_bytes, transferred); diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 3f993e24af..0086d67d83 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -292,7 +292,6 @@ void qemu_fflush(QEMUFile *f) qemu_file_set_error_obj(f, -EIO, local_error); } else { uint64_t size = iov_size(f->iov, f->iovcnt); -migration_rate_limit_account(size); f->total_transferred += size; } @@ -344,9 +343,6 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset, if (f->hooks && f->hooks->save_page) { int ret = f->hooks->save_page(f, block_offset, offset, size, bytes_sent); -if (ret != RAM_SAVE_CONTROL_NOT_SUPP) { -migration_rate_limit_account(size); -} if (ret != RAM_SAVE_CONTROL_DELAYED && ret != RAM_SAVE_CONTROL_NOT_SUPP) {
Re: [PATCH 11/21] migration: Move migration_total_bytes() to migration-stats.c
On 5/8/23 15:08, Juan Quintela wrote: Once there rename it to migration_transferred_bytes() and pass a QEMUFile instead of a migration object. Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater C. --- migration/migration-stats.c | 6 ++ migration/migration-stats.h | 9 + migration/migration.c | 13 +++-- 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/migration/migration-stats.c b/migration/migration-stats.c index e01842cabc..fba66c4577 100644 --- a/migration/migration-stats.c +++ b/migration/migration-stats.c @@ -63,3 +63,9 @@ void migration_rate_limit_account(uint64_t len) { stat64_add(_stats.rate_limit_used, len); } + +uint64_t migration_transferred_bytes(QEMUFile *f) +{ +return qemu_file_transferred(f) + stat64_get(_stats.multifd_bytes); +} + diff --git a/migration/migration-stats.h b/migration/migration-stats.h index 65f11ec7d1..c82fce9608 100644 --- a/migration/migration-stats.h +++ b/migration/migration-stats.h @@ -137,4 +137,13 @@ void migration_rate_limit_reset(void); */ void migration_rate_limit_set(uint64_t new_rate); +/** + * migration_transferred_bytes: Return number of bytes transferred + * + * Returtns how many bytes have we transferred since the beginning of + * the migration. It accounts for bytes sent through any migration + * channel, multifd, qemu_file, rdma, + */ +uint64_t migration_transferred_bytes(QEMUFile *f); + #endif diff --git a/migration/migration.c b/migration/migration.c index 370998600e..e6d262ffe1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2617,16 +2617,9 @@ static MigThrError migration_detect_error(MigrationState *s) } } -/* How many bytes have we transferred since the beginning of the migration */ -static uint64_t migration_total_bytes(MigrationState *s) -{ -return qemu_file_transferred(s->to_dst_file) + -stat64_get(_stats.multifd_bytes); -} - static void migration_calculate_complete(MigrationState *s) { -uint64_t bytes = migration_total_bytes(s); +uint64_t bytes = migration_transferred_bytes(s->to_dst_file); int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); int64_t transfer_time; @@ -2652,7 +2645,7 @@ static void update_iteration_initial_status(MigrationState *s) * wrong speed calculation. */ s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); -s->iteration_initial_bytes = migration_total_bytes(s); +s->iteration_initial_bytes = migration_transferred_bytes(s->to_dst_file); s->iteration_initial_pages = ram_get_total_transferred_pages(); } @@ -2667,7 +2660,7 @@ static void migration_update_counters(MigrationState *s, return; } -current_bytes = migration_total_bytes(s); +current_bytes = migration_transferred_bytes(s->to_dst_file); transferred = current_bytes - s->iteration_initial_bytes; time_spent = current_time - s->iteration_start_time; bandwidth = (double)transferred / time_spent;
Re: [PATCH 12/21] migration: Add a trace for migration_transferred_bytes
On 5/8/23 15:09, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater Thanks, C. --- migration/migration-stats.c | 8 ++-- migration/trace-events | 3 +++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/migration/migration-stats.c b/migration/migration-stats.c index fba66c4577..46b2b0d06e 100644 --- a/migration/migration-stats.c +++ b/migration/migration-stats.c @@ -14,6 +14,7 @@ #include "qemu/stats64.h" #include "qemu/timer.h" #include "qemu-file.h" +#include "trace.h" #include "migration-stats.h" MigrationAtomicStats mig_stats; @@ -66,6 +67,9 @@ void migration_rate_limit_account(uint64_t len) uint64_t migration_transferred_bytes(QEMUFile *f) { -return qemu_file_transferred(f) + stat64_get(_stats.multifd_bytes); -} +uint64_t multifd = stat64_get(_stats.multifd_bytes); +uint64_t qemu_file = qemu_file_transferred(f); +trace_migration_transferred_bytes(qemu_file, multifd); +return qemu_file + multifd; +} diff --git a/migration/trace-events b/migration/trace-events index 92161eeac5..4b6e802833 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -186,6 +186,9 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d" process_incoming_migration_co_postcopy_end_main(void) "" postcopy_preempt_enabled(bool value) "%d" +# migration-stats +migration_transferred_bytes(uint64_t qemu_file, uint64_t multifd) "qemu_file %" PRIu64 " multifd %" PRIu64 + # channel.c migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s" migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname, void *err) "ioc=%p ioctype=%s hostname=%s err=%p"
Re: [PATCH 13/21] migration: Use migration_transferred_bytes() to calculate rate_limit
On 5/8/23 15:09, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater C. --- migration/migration-stats.c | 7 +-- migration/migration-stats.h | 6 +- migration/migration.c | 2 +- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/migration/migration-stats.c b/migration/migration-stats.c index 46b2b0d06e..eb1a2c1ad4 100644 --- a/migration/migration-stats.c +++ b/migration/migration-stats.c @@ -31,7 +31,9 @@ bool migration_rate_limit_exceeded(QEMUFile *f) return true; } -uint64_t rate_limit_used = stat64_get(_stats.rate_limit_used); +uint64_t rate_limit_start = stat64_get(_stats.rate_limit_start); +uint64_t rate_limit_current = migration_transferred_bytes(f); +uint64_t rate_limit_used = rate_limit_current - rate_limit_start; uint64_t rate_limit_max = stat64_get(_stats.rate_limit_max); /* * rate_limit_max == 0 means no rate_limit enfoncement. @@ -55,9 +57,10 @@ void migration_rate_limit_set(uint64_t limit) stat64_set(_stats.rate_limit_max, limit); } -void migration_rate_limit_reset(void) +void migration_rate_limit_reset(QEMUFile *f) { stat64_set(_stats.rate_limit_used, 0); +stat64_set(_stats.rate_limit_start, migration_transferred_bytes(f)); } void migration_rate_limit_account(uint64_t len) diff --git a/migration/migration-stats.h b/migration/migration-stats.h index c82fce9608..4029f1deab 100644 --- a/migration/migration-stats.h +++ b/migration/migration-stats.h @@ -69,6 +69,10 @@ typedef struct { * Number of bytes sent during precopy stage. */ Stat64 precopy_bytes; +/* + * Amount of transferred data at the start of current cycle. + */ +Stat64 rate_limit_start; /* * Maximum amount of data we can send in a cycle. */ @@ -126,7 +130,7 @@ uint64_t migration_rate_limit_get(void); * * This is called when we know we start a new transfer cycle. */ -void migration_rate_limit_reset(void); +void migration_rate_limit_reset(QEMUFile *f); /** * migration_rate_limit_set: Set the maximum amount that can be transferred. diff --git a/migration/migration.c b/migration/migration.c index e6d262ffe1..6922c612e4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2684,7 +2684,7 @@ static void migration_update_counters(MigrationState *s, stat64_get(_stats.dirty_bytes_last_sync) / bandwidth; } -migration_rate_limit_reset(); +migration_rate_limit_reset(s->to_dst_file); update_iteration_initial_status(s);
Re: [PATCH 10/21] migration: Move rate_limit_max and rate_limit_used to migration_stats
On 5/8/23 15:08, Juan Quintela wrote: This way we can make them atomic and use this functions from any place. I also moved all functions that use rate_limit to migration-stats. Functions got renamed, they are not qemu_file anymore. qemu_file_rate_limit -> migration_rate_limit_exceeded qemu_file_set_rate_limit -> migration_rate_limit_set qemu_file_get_rate_limit -> migration_rate_limit_get qemu_file_reset_rate_limit -> migration_rate_limit_reset qemu_file_acct_rate_limit -> migration_rate_limit_account. Signed-off-by: Juan Quintela --- If you have any good suggestion for better names, I am all ears. May be : qemu_file_rate_limit -> migration_rate_limit_is_exceeded qemu_file_acct_rate_limit -> migration_rate_limit_inc Also, migration_rate_limit() would need some prefix to understand what is its purpose. Do we really need "_limit" in the names ? Thanks, C. --- hw/ppc/spapr.c | 5 +-- hw/s390x/s390-stattrib.c| 2 +- include/migration/qemu-file-types.h | 2 +- migration/block-dirty-bitmap.c | 2 +- migration/block.c | 5 +-- migration/migration-stats.c | 41 ++ migration/migration-stats.h | 42 +++ migration/migration.c | 14 migration/multifd.c | 2 +- migration/options.c | 7 ++-- migration/qemu-file.c | 53 ++--- migration/qemu-file.h | 11 -- migration/ram.c | 2 +- migration/savevm.c | 2 +- 14 files changed, 108 insertions(+), 82 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index ddc9c7b1a1..dbd2753278 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2166,7 +2166,7 @@ static void htab_save_first_pass(QEMUFile *f, SpaprMachineState *spapr, break; } } -} while ((index < htabslots) && !qemu_file_rate_limit(f)); +} while ((index < htabslots) && !migration_rate_limit_exceeded(f)); if (index >= htabslots) { assert(index == htabslots); @@ -2237,7 +2237,8 @@ static int htab_save_later_pass(QEMUFile *f, SpaprMachineState *spapr, assert(index == htabslots); index = 0; } -} while ((examined < htabslots) && (!qemu_file_rate_limit(f) || final)); +} while ((examined < htabslots) && + (!migration_rate_limit_exceeded(f) || final)); if (index >= htabslots) { assert(index == htabslots); diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c index aed919ad7d..fb0a20f2e1 100644 --- a/hw/s390x/s390-stattrib.c +++ b/hw/s390x/s390-stattrib.c @@ -209,7 +209,7 @@ static int cmma_save(QEMUFile *f, void *opaque, int final) return -ENOMEM; } -while (final ? 1 : qemu_file_rate_limit(f) == 0) { +while (final ? 1 : migration_rate_limit_exceeded(f) == 0) { reallen = sac->get_stattr(sas, _gfn, buflen, buf); if (reallen < 0) { g_free(buf); diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h index 1436f9ce92..0354f45198 100644 --- a/include/migration/qemu-file-types.h +++ b/include/migration/qemu-file-types.h @@ -165,6 +165,6 @@ size_t coroutine_mixed_fn qemu_get_counted_string(QEMUFile *f, char buf[256]); void qemu_put_counted_string(QEMUFile *f, const char *name); -int qemu_file_rate_limit(QEMUFile *f); +bool migration_rate_limit_exceeded(QEMUFile *f); #endif diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c index 20f36e6bd8..a815678926 100644 --- a/migration/block-dirty-bitmap.c +++ b/migration/block-dirty-bitmap.c @@ -706,7 +706,7 @@ static void bulk_phase(QEMUFile *f, DBMSaveState *s, bool limit) QSIMPLEQ_FOREACH(dbms, >dbms_list, entry) { while (!dbms->bulk_completed) { bulk_phase_send_chunk(f, s, dbms); -if (limit && qemu_file_rate_limit(f)) { +if (limit && migration_rate_limit_exceeded(f)) { return; } } diff --git a/migration/block.c b/migration/block.c index 12617b4152..fc1caa9ca6 100644 --- a/migration/block.c +++ b/migration/block.c @@ -23,6 +23,7 @@ #include "block/dirty-bitmap.h" #include "migration/misc.h" #include "migration.h" +#include "migration-stats.h" #include "migration/register.h" #include "qemu-file.h" #include "migration/vmstate.h" @@ -625,7 +626,7 @@ static int flush_blks(QEMUFile *f) blk_mig_lock(); while ((blk = QSIMPLEQ_FIRST(_mig_state.blk_list)) != NULL) { -if (qemu_file_rate_limit(f)) { +if (migration_rate_limit_exceeded(f)) { break; } if (blk->ret < 0) { @@ -762,7 +763,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque) /* control the rate of transfer */ blk_mig_lock(); while
Re: [PATCH 09/21] qemu-file: Account for rate_limit usage on qemu_fflush()
On 5/8/23 15:08, Juan Quintela wrote: That is the moment we know we have transferred something. Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater Thanks, C. --- migration/qemu-file.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 6ebc2bd3ec..8de1ecd082 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -301,7 +301,9 @@ void qemu_fflush(QEMUFile *f) _error) < 0) { qemu_file_set_error_obj(f, -EIO, local_error); } else { -f->total_transferred += iov_size(f->iov, f->iovcnt); +uint64_t size = iov_size(f->iov, f->iovcnt); +qemu_file_acct_rate_limit(f, size); +f->total_transferred += size; } qemu_iovec_release_ram(f); @@ -518,7 +520,6 @@ void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size, return; } -f->rate_limit_used += size; add_to_iovec(f, buf, size, may_free); } @@ -536,7 +537,6 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size) l = size; } memcpy(f->buf + f->buf_index, buf, l); -f->rate_limit_used += l; add_buf_to_iovec(f, l); if (qemu_file_get_error(f)) { break; @@ -553,7 +553,6 @@ void qemu_put_byte(QEMUFile *f, int v) } f->buf[f->buf_index] = v; -f->rate_limit_used++; add_buf_to_iovec(f, 1); }
Re: [PATCH 08/21] migration: Move setup_time to mig_stats
On 5/8/23 15:08, Juan Quintela wrote: It is a time that needs to be cleaned each time cancel migration. Once there ccreate calculate_time_since() to calculate how time since a time in the past. Signed-off-by: Juan Quintela --- migration/migration-stats.c | 7 +++ migration/migration-stats.h | 14 ++ migration/migration.c | 9 - migration/migration.h | 1 - 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/migration/migration-stats.c b/migration/migration-stats.c index 2f2cea965c..5278c6c821 100644 --- a/migration/migration-stats.c +++ b/migration/migration-stats.c @@ -12,6 +12,13 @@ #include "qemu/osdep.h" #include "qemu/stats64.h" +#include "qemu/timer.h" #include "migration-stats.h" MigrationAtomicStats mig_stats; + +void calculate_time_since(Stat64 *val, int64_t since) +{ +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST); +stat64_set(val, now - since); +} diff --git a/migration/migration-stats.h b/migration/migration-stats.h index cf8a4f0410..73c73d75b9 100644 --- a/migration/migration-stats.h +++ b/migration/migration-stats.h @@ -69,6 +69,10 @@ typedef struct { * Number of bytes sent during precopy stage. */ Stat64 precopy_bytes; +/* + * How long has the setup stage took. + */ +Stat64 setup_time; /* * Total number of bytes transferred. */ @@ -81,4 +85,14 @@ typedef struct { extern MigrationAtomicStats mig_stats; +/** + * calculate_time_since: Calculate how much time has passed + * + * @val: stat64 where to store the time + * @since: reference time since we want to calculate + * + * Returns: Nothing. The time is stored in val. + */ + +void calculate_time_since(Stat64 *val, int64_t since); Since this routine is in the "migration" namespace, I would rename it to void migration_time_since(Stat64 *val, int64_t since); of even void migration_time_since(MigrationAtomicStats *stat, int64_t since); Do you need it elsewhere than in migration.c ? Thanks, C. #endif diff --git a/migration/migration.c b/migration/migration.c index b1cfb56523..72286de969 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -884,7 +884,7 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s) { info->has_status = true; info->has_setup_time = true; -info->setup_time = s->setup_time; +info->setup_time = stat64_get(_stats.setup_time); if (s->state == MIGRATION_STATUS_COMPLETED) { info->has_total_time = true; @@ -1387,7 +1387,6 @@ void migrate_init(MigrationState *s) s->pages_per_second = 0.0; s->downtime = 0; s->expected_downtime = 0; -s->setup_time = 0; s->start_postcopy = false; s->postcopy_after_devices = false; s->migration_thread_running = false; @@ -2640,7 +2639,7 @@ static void migration_calculate_complete(MigrationState *s) s->downtime = end_time - s->downtime_start; } -transfer_time = s->total_time - s->setup_time; +transfer_time = s->total_time - stat64_get(_stats.setup_time); if (transfer_time) { s->mbps = ((double) bytes * 8.0) / transfer_time / 1000; } @@ -2965,7 +2964,7 @@ static void *migration_thread(void *opaque) qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); -s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; +calculate_time_since(_stats.setup_time, setup_start); trace_migration_thread_setup_complete(); @@ -3077,7 +3076,7 @@ static void *bg_migration_thread(void *opaque) qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); -s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; +calculate_time_since(_stats.setup_time, setup_start); trace_migration_thread_setup_complete(); s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); diff --git a/migration/migration.h b/migration/migration.h index 3a918514e7..7f554455ac 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -311,7 +311,6 @@ struct MigrationState { int64_t downtime; int64_t expected_downtime; bool capabilities[MIGRATION_CAPABILITY__MAX]; -int64_t setup_time; /* * Whether guest was running when we enter the completion stage. * If migration is interrupted by any reason, we need to continue
Re: [PATCH 06/21] qemu-file: Remove total from qemu_file_total_transferred_*()
On 5/8/23 15:08, Juan Quintela wrote: Function is already quite long. Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater C. --- migration/block.c | 4 ++-- migration/migration.c | 2 +- migration/qemu-file.c | 4 ++-- migration/qemu-file.h | 10 +- migration/savevm.c| 6 +++--- migration/vmstate.c | 5 ++--- 6 files changed, 15 insertions(+), 16 deletions(-) diff --git a/migration/block.c b/migration/block.c index a37678ce95..12617b4152 100644 --- a/migration/block.c +++ b/migration/block.c @@ -747,7 +747,7 @@ static int block_save_setup(QEMUFile *f, void *opaque) static int block_save_iterate(QEMUFile *f, void *opaque) { int ret; -uint64_t last_bytes = qemu_file_total_transferred(f); +uint64_t last_bytes = qemu_file_transferred(f); trace_migration_block_save("iterate", block_mig_state.submitted, block_mig_state.transferred); @@ -799,7 +799,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque) } qemu_put_be64(f, BLK_MIG_FLAG_EOS); -uint64_t delta_bytes = qemu_file_total_transferred(f) - last_bytes; +uint64_t delta_bytes = qemu_file_transferred(f) - last_bytes; return (delta_bytes > 0); } diff --git a/migration/migration.c b/migration/migration.c index e17a6538b4..b1cfb56523 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2621,7 +2621,7 @@ static MigThrError migration_detect_error(MigrationState *s) /* How many bytes have we transferred since the beginning of the migration */ static uint64_t migration_total_bytes(MigrationState *s) { -return qemu_file_total_transferred(s->to_dst_file) + +return qemu_file_transferred(s->to_dst_file) + stat64_get(_stats.multifd_bytes); } diff --git a/migration/qemu-file.c b/migration/qemu-file.c index f3cb0cd94f..6ebc2bd3ec 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -709,7 +709,7 @@ int coroutine_mixed_fn qemu_get_byte(QEMUFile *f) return result; } -uint64_t qemu_file_total_transferred_fast(QEMUFile *f) +uint64_t qemu_file_transferred_fast(QEMUFile *f) { uint64_t ret = f->total_transferred; int i; @@ -721,7 +721,7 @@ uint64_t qemu_file_total_transferred_fast(QEMUFile *f) return ret; } -uint64_t qemu_file_total_transferred(QEMUFile *f) +uint64_t qemu_file_transferred(QEMUFile *f) { qemu_fflush(f); return f->total_transferred; diff --git a/migration/qemu-file.h b/migration/qemu-file.h index d758e7f10b..ab164a58d0 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -68,7 +68,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks); int qemu_fclose(QEMUFile *f); /* - * qemu_file_total_transferred: + * qemu_file_transferred: * * Report the total number of bytes transferred with * this file. @@ -83,19 +83,19 @@ int qemu_fclose(QEMUFile *f); * * Returns: the total bytes transferred */ -uint64_t qemu_file_total_transferred(QEMUFile *f); +uint64_t qemu_file_transferred(QEMUFile *f); /* - * qemu_file_total_transferred_fast: + * qemu_file_transferred_fast: * - * As qemu_file_total_transferred except for writable + * As qemu_file_transferred except for writable * files, where no flush is performed and the reported * amount will include the size of any queued buffers, * on top of the amount actually transferred. * * Returns: the total bytes transferred and queued */ -uint64_t qemu_file_total_transferred_fast(QEMUFile *f); +uint64_t qemu_file_transferred_fast(QEMUFile *f); /* * put_buffer without copying the buffer. diff --git a/migration/savevm.c b/migration/savevm.c index 032044b1d5..e33788343a 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -927,9 +927,9 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se) static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc) { -uint64_t old_offset = qemu_file_total_transferred_fast(f); +uint64_t old_offset = qemu_file_transferred_fast(f); se->ops->save_state(f, se->opaque); -uint64_t size = qemu_file_total_transferred_fast(f) - old_offset; +uint64_t size = qemu_file_transferred_fast(f) - old_offset; if (vmdesc) { json_writer_int64(vmdesc, "size", size); @@ -2956,7 +2956,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate, goto the_end; } ret = qemu_savevm_state(f, errp); -vm_state_size = qemu_file_total_transferred(f); +vm_state_size = qemu_file_transferred(f); ret2 = qemu_fclose(f); if (ret < 0) { goto the_end; diff --git a/migration/vmstate.c b/migration/vmstate.c index 351f56104e..af01d54b6f 100644 --- a/migration/vmstate.c +++ b/migration/vmstate.c @@ -361,7 +361,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescr
Re: [PATCH 05/21] qemu-file: Make rate_limit_used an uint64_t
On 5/8/23 15:08, Juan Quintela wrote: Change all the functions that use it. It was already passed as uint64_t. Signed-off-by: Juan Quintela Reviewed-by: Daniel P. Berrangé Message-Id: <20230504113841.23130-5-quint...@redhat.com> Reviewed-by: Cédric Le Goater C. --- migration/qemu-file.c | 4 ++-- migration/qemu-file.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 346b683929..f3cb0cd94f 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -49,7 +49,7 @@ struct QEMUFile { * Total amount of data in bytes queued for transfer * during this rate limiting time window */ -int64_t rate_limit_used; +uint64_t rate_limit_used; /* The sum of bytes transferred on the wire */ uint64_t total_transferred; @@ -759,7 +759,7 @@ void qemu_file_reset_rate_limit(QEMUFile *f) f->rate_limit_used = 0; } -void qemu_file_acct_rate_limit(QEMUFile *f, int64_t len) +void qemu_file_acct_rate_limit(QEMUFile *f, uint64_t len) { f->rate_limit_used += len; } diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 04ca48cbef..d758e7f10b 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -137,7 +137,7 @@ void qemu_file_reset_rate_limit(QEMUFile *f); * out of band from the main file object I/O methods, and * need to be applied to the rate limiting calcuations */ -void qemu_file_acct_rate_limit(QEMUFile *f, int64_t len); +void qemu_file_acct_rate_limit(QEMUFile *f, uint64_t len); void qemu_file_set_rate_limit(QEMUFile *f, uint64_t new_rate); uint64_t qemu_file_get_rate_limit(QEMUFile *f); int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
Re: [PATCH 04/21] qemu-file: make qemu_file_[sg]et_rate_limit() use an uint64_t
On 5/8/23 15:08, Juan Quintela wrote: It is really size_t. Everything else uses uint64_t, so move this to uint64_t as well. A size can't be negative anyways. Signed-off-by: Juan Quintela Message-Id: <20230504113841.23130-4-quint...@redhat.com> Reviewed-by: Cédric Le Goater C. * --- Don't drop the check if rate_limit_max is zero. --- migration/qemu-file.c | 6 +++--- migration/qemu-file.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 12cf7fb04e..346b683929 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -44,7 +44,7 @@ struct QEMUFile { * Maximum amount of data in bytes to transfer during one * rate limiting time window */ -int64_t rate_limit_max; +uint64_t rate_limit_max; /* * Total amount of data in bytes queued for transfer * during this rate limiting time window @@ -741,12 +741,12 @@ int qemu_file_rate_limit(QEMUFile *f) return 0; } -int64_t qemu_file_get_rate_limit(QEMUFile *f) +uint64_t qemu_file_get_rate_limit(QEMUFile *f) { return f->rate_limit_max; } -void qemu_file_set_rate_limit(QEMUFile *f, int64_t limit) +void qemu_file_set_rate_limit(QEMUFile *f, uint64_t limit) { /* * 'limit' is per second. But we check it each 100 miliseconds. diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 4f26bf6961..04ca48cbef 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -138,8 +138,8 @@ void qemu_file_reset_rate_limit(QEMUFile *f); * need to be applied to the rate limiting calcuations */ void qemu_file_acct_rate_limit(QEMUFile *f, int64_t len); -void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate); -int64_t qemu_file_get_rate_limit(QEMUFile *f); +void qemu_file_set_rate_limit(QEMUFile *f, uint64_t new_rate); +uint64_t qemu_file_get_rate_limit(QEMUFile *f); int qemu_file_get_error_obj(QEMUFile *f, Error **errp); int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
Re: [PATCH 03/21] migration: We set the rate_limit by a second
On 5/8/23 15:08, Juan Quintela wrote: That the implementation does the check every 100 milliseconds is an implementation detail that shouldn't be seen on the interfaz. Si. Pero, "interface" es mejor aqui. Notice that all callers of qemu_file_set_rate_limit() used the division or pass 0, so this change is a NOP. Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater C. --- migration/migration.c | 7 +++ migration/options.c | 4 ++-- migration/qemu-file.c | 6 +- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 3979a98949..e17a6538b4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2117,7 +2117,7 @@ static int postcopy_start(MigrationState *ms) * will notice we're in POSTCOPY_ACTIVE and not actually * wrap their state up here */ -qemu_file_set_rate_limit(ms->to_dst_file, bandwidth / XFER_LIMIT_RATIO); +qemu_file_set_rate_limit(ms->to_dst_file, bandwidth); if (migrate_postcopy_ram()) { /* Ping just for debugging, helps line traces up */ qemu_savevm_send_ping(ms->to_dst_file, 2); @@ -3207,11 +3207,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) if (resume) { /* This is a resumed migration */ -rate_limit = migrate_max_postcopy_bandwidth() / -XFER_LIMIT_RATIO; +rate_limit = migrate_max_postcopy_bandwidth(); } else { /* This is a fresh new migration */ -rate_limit = migrate_max_bandwidth() / XFER_LIMIT_RATIO; +rate_limit = migrate_max_bandwidth(); /* Notify before starting migration thread */ notifier_list_notify(_state_notifiers, s); diff --git a/migration/options.c b/migration/options.c index 2e759cc306..d04b5fbc3a 100644 --- a/migration/options.c +++ b/migration/options.c @@ -1243,7 +1243,7 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) s->parameters.max_bandwidth = params->max_bandwidth; if (s->to_dst_file && !migration_in_postcopy()) { qemu_file_set_rate_limit(s->to_dst_file, -s->parameters.max_bandwidth / XFER_LIMIT_RATIO); +s->parameters.max_bandwidth); } } @@ -1275,7 +1275,7 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) s->parameters.max_postcopy_bandwidth = params->max_postcopy_bandwidth; if (s->to_dst_file && migration_in_postcopy()) { qemu_file_set_rate_limit(s->to_dst_file, -s->parameters.max_postcopy_bandwidth / XFER_LIMIT_RATIO); +s->parameters.max_postcopy_bandwidth); } } if (params->has_max_cpu_throttle) { diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 745361d238..12cf7fb04e 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -29,6 +29,7 @@ #include "migration.h" #include "qemu-file.h" #include "trace.h" +#include "options.h" #include "qapi/error.h" #define IO_BUF_SIZE 32768 @@ -747,7 +748,10 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f) void qemu_file_set_rate_limit(QEMUFile *f, int64_t limit) { -f->rate_limit_max = limit; +/* + * 'limit' is per second. But we check it each 100 miliseconds. + */ +f->rate_limit_max = limit / XFER_LIMIT_RATIO; } void qemu_file_reset_rate_limit(QEMUFile *f)
Re: [PATCH 02/21] migration: Don't use INT64_MAX for unlimited rate
On 5/9/23 13:51, Juan Quintela wrote: Harsh Prateek Bora wrote: On 5/8/23 18:38, Juan Quintela wrote: Use 0 instead. Signed-off-by: Juan Quintela --- migration/migration.c | 4 ++-- migration/qemu-file.c | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 1192f1ebf1..3979a98949 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2296,7 +2296,7 @@ static void migration_completion(MigrationState *s) } if (ret >= 0) { s->block_inactive = !migrate_colo(); -qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); +qemu_file_set_rate_limit(s->to_dst_file, 0); #define RATE_LIMIT_MAX 0 How about having a macro and use that which conveys the meaning in all call instances wherever it is getting passed ? I almost preffer the macro. qemu_file_set_rate_limit(s->to_dst_file, RATE_LIMIT_MAX); seems quite explanatory? yep. and I would drop the comment qemu_file_rate_limit(). Thanks, C. Thanks, Juan. ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false, s->block_inactive); } @@ -3044,7 +3044,7 @@ static void *bg_migration_thread(void *opaque) rcu_register_thread(); object_ref(OBJECT(s)); -qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); +qemu_file_set_rate_limit(s->to_dst_file, 0); setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); /* diff --git a/migration/qemu-file.c b/migration/qemu-file.c index f4cfd05c67..745361d238 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -731,6 +731,9 @@ int qemu_file_rate_limit(QEMUFile *f) if (qemu_file_get_error(f)) { return 1; } +/* + * rate_limit_max == 0 means no rate_limit enfoncement. + */ if (f->rate_limit_max > 0 && f->rate_limit_used > f->rate_limit_max) { return 1; }
Re: [PATCH 01/21] migration: A rate limit value of 0 is valid
On 5/8/23 15:08, Juan Quintela wrote: And it is the best way to not have rate_limit. Signed-off-by: Juan Quintela Reviewed-by: Cédric Le Goater C. --- migration/migration.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 232e387109..1192f1ebf1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2117,12 +2117,7 @@ static int postcopy_start(MigrationState *ms) * will notice we're in POSTCOPY_ACTIVE and not actually * wrap their state up here */ -/* 0 max-postcopy-bandwidth means unlimited */ -if (!bandwidth) { -qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX); -} else { -qemu_file_set_rate_limit(ms->to_dst_file, bandwidth / XFER_LIMIT_RATIO); -} +qemu_file_set_rate_limit(ms->to_dst_file, bandwidth / XFER_LIMIT_RATIO); if (migrate_postcopy_ram()) { /* Ping just for debugging, helps line traces up */ qemu_savevm_send_ping(ms->to_dst_file, 2);
Re: [PULL 04/16] aspeed/smc: Cache AspeedSMCClass
Hello, On 5/12/23 06:00, Philippe Mathieu-Daudé wrote: Hi Cédric, On 25/10/22 17:20, Cédric Le Goater wrote: Store a reference on the AspeedSMC class under the flash object and use it when accessing the flash contents. Avoiding the class cast checkers in these hot paths improves performance by 10% when running the aspeed avocado tests. I doubt you still have your benchmark number, but do you remember if you were using --enable-qom-cast-debug ? It is relatively easy to run. Grab : https://github.com/legoater/qemu-aspeed-boot/raw/master/images/ast2500-evb/buildroot-2023.02/flash.img and run : qemu-system-arm -M ast2500-evb,execute-in-place=true -net user -drive file=./flash.img,format=raw,if=mtd -nographic I tried with and without --enable-qom-cast-debug. It doesn't make much difference. C. Message-Id: <20220923084803.498337-7-...@kaod.org> Signed-off-by: Cédric Le Goater --- include/hw/ssi/aspeed_smc.h | 2 ++ hw/ssi/aspeed_smc.c | 9 - 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/include/hw/ssi/aspeed_smc.h b/include/hw/ssi/aspeed_smc.h index 2d5f8f3d8f68..8e1dda556b91 100644 --- a/include/hw/ssi/aspeed_smc.h +++ b/include/hw/ssi/aspeed_smc.h @@ -30,6 +30,7 @@ #include "qom/object.h" struct AspeedSMCState; +struct AspeedSMCClass; #define TYPE_ASPEED_SMC_FLASH "aspeed.smc.flash" OBJECT_DECLARE_SIMPLE_TYPE(AspeedSMCFlash, ASPEED_SMC_FLASH) @@ -37,6 +38,7 @@ struct AspeedSMCFlash { SysBusDevice parent_obj; struct AspeedSMCState *controller; + struct AspeedSMCClass *asc; uint8_t cs; MemoryRegion mmio; diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index faed7e0cbe17..22df4be528a7 100644 --- a/hw/ssi/aspeed_smc.c +++ b/hw/ssi/aspeed_smc.c @@ -388,7 +388,7 @@ static inline int aspeed_smc_flash_cmd(const AspeedSMCFlash *fl) static inline int aspeed_smc_flash_addr_width(const AspeedSMCFlash *fl) { const AspeedSMCState *s = fl->controller; - AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s); + AspeedSMCClass *asc = fl->asc; if (asc->addr_width) { return asc->addr_width(s); @@ -420,7 +420,7 @@ static uint32_t aspeed_smc_check_segment_addr(const AspeedSMCFlash *fl, uint32_t addr) { const AspeedSMCState *s = fl->controller; - AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s); + AspeedSMCClass *asc = fl->asc; AspeedSegments seg; asc->reg_to_segment(s, s->regs[R_SEG_ADDR0 + fl->cs], ); @@ -1234,7 +1234,6 @@ static const TypeInfo aspeed_smc_info = { static void aspeed_smc_flash_realize(DeviceState *dev, Error **errp) { AspeedSMCFlash *s = ASPEED_SMC_FLASH(dev); - AspeedSMCClass *asc; g_autofree char *name = g_strdup_printf(TYPE_ASPEED_SMC_FLASH ".%d", s->cs); if (!s->controller) { @@ -1242,14 +1241,14 @@ static void aspeed_smc_flash_realize(DeviceState *dev, Error **errp) return; } - asc = ASPEED_SMC_GET_CLASS(s->controller); + s->asc = ASPEED_SMC_GET_CLASS(s->controller); /* * Use the default segment value to size the memory region. This * can be changed by FW at runtime. */ memory_region_init_io(>mmio, OBJECT(s), _smc_flash_ops, - s, name, asc->segments[s->cs].size); + s, name, s->asc->segments[s->cs].size); sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mmio); }
Re: [PATCH] pflash: Fix blk_pread_nonzeroes()
On 3/7/23 15:02, Kevin Wolf wrote: Commit a4b15a8b introduced a new function blk_pread_nonzeroes(). Instead of reading directly from the root node of the BlockBackend, it reads from its 'file' child node. This can happen to mostly work for raw images (as long as the 'raw' format driver is in use, but not actually doing anything), but it breaks everything else. Fix it to read from the root node instead. Fixes: a4b15a8b9ef25b44fa92a4825312622600c1f37c Reported-by: Cédric Le Goater Signed-off-by: Kevin Wolf I have a couple of extra patches for 8.0. I can include this fix if no one does. Thanks, C. --- hw/block/block.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/block/block.c b/hw/block/block.c index af0710e477..9f52ee6e72 100644 --- a/hw/block/block.c +++ b/hw/block/block.c @@ -39,8 +39,7 @@ static int blk_pread_nonzeroes(BlockBackend *blk, hwaddr size, void *buf) return ret; } if (!(ret & BDRV_BLOCK_ZERO)) { -ret = bdrv_pread(bs->file, offset, bytes, - (uint8_t *) buf + offset, 0); +ret = blk_pread(blk, offset, bytes, (uint8_t *) buf + offset, 0); if (ret < 0) { return ret; }
Re: [PULL 03/38] pflash: Only read non-zero parts of backend image
On 3/7/23 15:00, Kevin Wolf wrote: Am 03.03.2023 um 23:51 hat Maciej S. Szmigiero geschrieben: On 8.02.2023 12:19, Cédric Le Goater wrote: On 2/7/23 13:48, Kevin Wolf wrote: Am 07.02.2023 um 10:19 hat Cédric Le Goater geschrieben: On 2/7/23 09:38, Kevin Wolf wrote: Am 06.02.2023 um 16:54 hat Cédric Le Goater geschrieben: On 1/20/23 13:25, Kevin Wolf wrote: From: Xiang Zheng Currently we fill the VIRT_FLASH memory space with two 64MB NOR images when using persistent UEFI variables on virt board. Actually we only use a very small(non-zero) part of the memory while the rest significant large(zero) part of memory is wasted. So this patch checks the block status and only writes the non-zero part into memory. This requires pflash devices to use sparse files for backends. Signed-off-by: Xiang Zheng [ kraxel: rebased to latest master ] Signed-off-by: Gerd Hoffmann Message-Id: <20221220084246.1984871-1-kra...@redhat.com> Reviewed-by: Daniel P. Berrangé Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf This newly merged patch introduces a "regression" when booting an Aspeed machine. The following extra m25p80 patch (not yet merged) is required for the issue to show: https://lore.kernel.org/qemu-devel/20221115151000.2080833-1-...@kaod.org/ U-Boot fails to find the filesystem in that case. It can be easily reproduced with the witherspoon-bmc machine and seems to be related to the use of a UBI filesystem. Other Aspeed machines not using UBI are not impacted. Here is a tentative fix. I don't know enough the block layer to explain what is happening :/ I was puzzled for a moment, but... @@ -39,7 +39,7 @@ static int blk_pread_nonzeroes(BlockBack return ret; } if (!(ret & BDRV_BLOCK_ZERO)) { - ret = bdrv_pread(bs->file, offset, bytes, 'bs->file' rather than 'bs' really looks wrong. I think replacing that would already fix the bug you're seeing. Just to be sure, how did you configure the block backend? bs->file would happen to work more or less with raw over file-posix (which is probably what Gerd tested), but I think it breaks with anything else. The command is : $ qemu-system-arm -M witherspoon-bmc -net user \ -drive file=/path/to/file.mtd,format=raw,if=mtd \ -nographic -serial mon:stdio -snapshot If I remove '-snapshot', all works fine. Ok, that makes sense then. -snapshot creates a temporary qcow2 overlay, and then what your guest sees with bs->file is not the virtual disk content of the qcow2 image, but the qcow2 file itself. yes. Same symptom with pflash devices, TCG and KVM. The guest hangs with -snapshot. C. qemu-system-aarch64 -M virt -smp 2 -cpu max -accel tcg,thread=multi -nographic -m 2G -drive if=pflash,format=raw,file=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,readonly=on -drive if=pflash,format=raw,file=rhel9-varstore.img -device virtio-net,netdev=net0,bus=pcie.0,addr=0x3 -netdev user,id=net0 -drive file=rhel9-arm64.qcow2,if=none,id=disk,format=qcow2,cache=none -device virtio-blk-device,drive=disk -serial mon:stdio -snapshot +1 here for QEMU + KVM/x86: OVMF CODE file fails to load (is all zeroes) with either "-snapshot" QEMU command line option or even with just "snapshot=on" setting enabled on pflash0. Reverting this patch seems to fix the issue. Hm, so we know the fix, but nobody has submitted it as an actual patch? Sorry. I thought the solution was more complex and got pulled in other tasks ... I'll send one... Thanks, C.
Re: [PATCH 7/8] aspeed: Introduce a spi_boot region under the SoC
On 2/15/23 12:02, Philippe Mathieu-Daudé wrote: On 14/2/23 18:18, Cédric Le Goater wrote: The default boot of the Aspeed SoCs is address 0x0. For this reason, the FMC flash device contents are remapped by HW on the first 256MB of the address space. In QEMU, this is currently done in the machine init with the setup of a region alias. Move this code to the SoC and introduce an extra container to prepare ground for the boot ROM region which will overlap the FMC flash remapping. Signed-off-by: Cédric Le Goater --- include/hw/arm/aspeed_soc.h | 3 +++ hw/arm/aspeed.c | 13 + hw/arm/aspeed_ast2600.c | 13 + hw/arm/aspeed_soc.c | 14 ++ hw/arm/fby35.c | 8 +--- 5 files changed, 32 insertions(+), 19 deletions(-) enum { + ASPEED_DEV_SPI_BOOT, ASPEED_DEV_IOMEM, ASPEED_DEV_UART1, ASPEED_DEV_UART2, #define ASPEED_SOC_DPMCU_SIZE 0x0004 static const hwaddr aspeed_soc_ast2600_memmap[] = { + [ASPEED_DEV_SPI_BOOT] = 0x0, Isn't this a constant address for this Soc family? If so, we can define ASPEED_SOC_RESET_ADDR once ... I will introduce : #define ASPEED_SPI_BOOT_ADDR 0x0 and use it every where it makes sense. This should replace FIRMWARE_ADDR in aspeed.c also. Thanks, C. [ASPEED_DEV_SRAM] = 0x1000, [ASPEED_DEV_DPMCU] = 0x1800, /* 0x1600 0x17FF : AHB BUS do LPC Bus bridge */ @@ -282,6 +283,12 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) qemu_irq irq; g_autofree char *sram_name = NULL; + /* Default boot region (SPI memory or ROMs) */ + memory_region_init(>spi_boot_container, OBJECT(s), + "aspeed.spi_boot_container", 0x1000); + memory_region_add_subregion(s->memory, sc->memmap[ASPEED_DEV_SPI_BOOT], ... and use it here. + >spi_boot_container);
Re: [PATCH 0/8] aspeed: I2C fixes, -drive removal (first step)
On 2/15/23 11:45, Philippe Mathieu-Daudé wrote: On 14/2/23 18:18, Cédric Le Goater wrote: Hello, This series starts with a first set of patches fixing I2C slave mode in the Aspeed I2C controller, a test device and its associated test in avocado. Follow some cleanups which allow the use of block devices instead of drives. So that, instead of specifying : -drive file=./flash-ast2600-evb,format=raw,if=mtd -drive file=./ast2600-evb.pnor,format=raw,if=mtd ... and guessing from the order which bus the device is attached to, we can use : -blockdev node-name=fmc0,driver=file,filename=./bmc.img -device mx66u51235f,bus=ssi.0,drive=fmc0 -blockdev node-name=fmc1,driver=file,filename=./bmc-alt.img -device mx66u51235f,bus=ssi.0,drive=fmc1 -blockdev node-name=pnor,driver=file,filename=./pnor -device mx66l1g45g,bus=ssi.1,drive=pnor ... It is not perfect, the CS index still depends on the order Quick thoughts here: TYPE_SSI_PERIPHERAL devices have one input SSI_GPIO_CS. TYPE_SSI_BUS could have a "cs-num" property (how many CS line associated with this bus) and create an array of #cs-num output SSI_GPIO_CS. TYPE_SSI_PERIPHERAL could have a "cs" (index) property; if set, upon ssi_peripheral_realize() when the device is plugged on the bus, the GPIO line is wired. yes. I would like to check first the impact on migration compatibility. Thanks, C. So we could set the 'cs=' property from CLI: -blockdev node-name=fmc0,driver=file,filename=./bmc.img -device mx66u51235f,bus=ssi.0,cs=1,drive=fmc0 -blockdev node-name=fmc1,driver=file,filename=./bmc-alt.img -device mx66u51235f,bus=ssi.0,cs=0,drive=fmc1 but it is now possible to run a machine without -drive ...,if=mtd. This lacks the final patch enabling the '-nodefaults' option by not creating the default devices if specified on the command line. It needs some more evaluation of the possible undesired effects. Thanks, C.
Re: [PATCH 3/8] hw/misc: add a toy i2c echo device
On 2/15/23 13:26, Philippe Mathieu-Daudé wrote: On 15/2/23 12:09, Cédric Le Goater wrote: On 2/15/23 11:55, Philippe Mathieu-Daudé wrote: On 14/2/23 18:18, Cédric Le Goater wrote: From: Klaus Jensen Add an example I2C device to demonstrate how a slave may master the bus and send data asynchronously to another slave. What a rebellion... The device will echo whatever it is sent to the device identified by the first byte received. Signed-off-by: Klaus Jensen [ clg: - Changed to build to use CONFIG_ASPEED_SOC since only supported on such SoCs - folded in these fixes : https://lore.kernel.org/qemu-devel/Y3yMKAhOkYGtnkOp@cormorant.local/ ] Message-Id: <20220601210831.67259-7-...@irrelevant.dk> Signed-off-by: Cédric Le Goater --- hw/misc/i2c-echo.c | 156 hw/misc/meson.build | 2 + 2 files changed, 158 insertions(+) create mode 100644 hw/misc/i2c-echo.c diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 448e14b531..3eb1bda710 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -129,6 +129,8 @@ softmmu_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_rng.c')) softmmu_ss.add(when: 'CONFIG_GRLIB', if_true: files('grlib_ahb_apb_pnp.c')) +softmmu_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('i2c-echo.c')) s/CONFIG_ASPEED_SOC/CONFIG_I2C/ since this is a generic device. even if only supported by the Aspeed SoC ? I am OK with both. Any machine exposing an i2c bus can use this device, isn't it? -device i2c-echo,bus=bus69,address=0x42 ... Would you have a machine with I2C buses and image to try that on ? Not an aspeed one obvioulsy Thanks, C.
Re: [PATCH 0/8] aspeed: I2C fixes, -drive removal (first step)
The next step would be to get rid of the drive_get(IF_MTD) internal API, which means finding a way to attach block backend devices from the command line to the default flash devices. This should be done at machine init time and the blockdev should have some 'bus@addr' identifier. I lack the knowledge on how this could be done. Possibly interesting for you: commit e0561e60f17 "hw/arm/virt: Support firmware configuration with -blockdev". I see. The mapping device<->blk is moved at the machine level with an option. The same could be done for the Aspeed machines with "fmc0", "spi0", identifiers. I think we should deprecate the "fmc-model" and "spi-model" machine options. They become useless with -nodefaults correctly implemented. Thanks,
Re: [RFC PATCH 5/5] hw/ppc/pnv_bmc: Simplify pnv_bmc_find()
On 2/16/23 20:16, Philippe Mathieu-Daudé wrote: On 16/2/23 19:12, Cédric Le Goater wrote: On 2/16/23 13:25, Philippe Mathieu-Daudé wrote: ForeachArgs::name is only used once as TYPE_IPMI_BMC. Since the penultimate commit, object_child_foreach_recursive()'s handler takes an Error* argument and return a boolean. We can directly pass ForeachArgs::obj as context, removing the ForeachArgs structure. Signed-off-by: Philippe Mathieu-Daudé --- RFC: please double-check... hw/ppc/pnv_bmc.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/hw/ppc/pnv_bmc.c b/hw/ppc/pnv_bmc.c index 05acc88a55..566284469f 100644 --- a/hw/ppc/pnv_bmc.c +++ b/hw/ppc/pnv_bmc.c @@ -278,36 +278,29 @@ IPMIBmc *pnv_bmc_create(PnvPnor *pnor) return IPMI_BMC(obj); } -typedef struct ForeachArgs { - const char *name; - Object *obj; -} ForeachArgs; - static bool bmc_find(Object *child, void *opaque, Error **errp) { - ForeachArgs *args = opaque; + Object **obj = opaque; - if (object_dynamic_cast(child, args->name)) { - if (args->obj) { - return false; The purpose of this test was to catch multiple bmc devices and it's removed now. Great. But it should be an error ! :) See the error_setg below. + if (object_dynamic_cast(child, TYPE_IPMI_BMC)) { + if (*obj) { + return true; } - args->obj = child; + *obj = child; } return true; } IPMIBmc *pnv_bmc_find(Error **errp) { - ForeachArgs args = { TYPE_IPMI_BMC, NULL }; - int ret; + Object *obj = NULL; - ret = object_child_foreach_recursive(object_get_root(), bmc_find, - , NULL); - if (ret) { + if (!object_child_foreach_recursive(object_get_root(), bmc_find, , + NULL)) { error_setg(errp, "machine should have only one BMC device. " "Use '-nodefaults'");> return NULL; } We don't test obj against NULL any more. This could break the QOM cast below. IIUC QOM cast-macros are NULL-safe, see https://lore.kernel.org/qemu-devel/20210107121304.1db97...@bahia.lan/ even when CONFIG_QOM_CAST_DEBUG is set ? I might have missed something. Thanks, C. If you concur I'll try to update the QOM API doc where relevant. - return args.obj ? IPMI_BMC(args.obj) : NULL; + return IPMI_BMC(obj); }
Re: [RFC PATCH 5/5] hw/ppc/pnv_bmc: Simplify pnv_bmc_find()
On 2/16/23 13:25, Philippe Mathieu-Daudé wrote: ForeachArgs::name is only used once as TYPE_IPMI_BMC. Since the penultimate commit, object_child_foreach_recursive()'s handler takes an Error* argument and return a boolean. We can directly pass ForeachArgs::obj as context, removing the ForeachArgs structure. Signed-off-by: Philippe Mathieu-Daudé --- RFC: please double-check... hw/ppc/pnv_bmc.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/hw/ppc/pnv_bmc.c b/hw/ppc/pnv_bmc.c index 05acc88a55..566284469f 100644 --- a/hw/ppc/pnv_bmc.c +++ b/hw/ppc/pnv_bmc.c @@ -278,36 +278,29 @@ IPMIBmc *pnv_bmc_create(PnvPnor *pnor) return IPMI_BMC(obj); } -typedef struct ForeachArgs { -const char *name; -Object *obj; -} ForeachArgs; - static bool bmc_find(Object *child, void *opaque, Error **errp) { -ForeachArgs *args = opaque; +Object **obj = opaque; -if (object_dynamic_cast(child, args->name)) { -if (args->obj) { -return false; The purpose of this test was to catch multiple bmc devices and it's removed now. +if (object_dynamic_cast(child, TYPE_IPMI_BMC)) { +if (*obj) { +return true; } -args->obj = child; +*obj = child; } return true; } IPMIBmc *pnv_bmc_find(Error **errp) { -ForeachArgs args = { TYPE_IPMI_BMC, NULL }; -int ret; +Object *obj = NULL; -ret = object_child_foreach_recursive(object_get_root(), bmc_find, - , NULL); -if (ret) { +if (!object_child_foreach_recursive(object_get_root(), bmc_find, , +NULL)) { error_setg(errp, "machine should have only one BMC device. " "Use '-nodefaults'");> return NULL; } We don't test obj against NULL any more. This could break the QOM cast below. Thanks, C. -return args.obj ? IPMI_BMC(args.obj) : NULL; +return IPMI_BMC(obj); }
Re: [PATCH 3/8] hw/misc: add a toy i2c echo device
On 2/15/23 11:55, Philippe Mathieu-Daudé wrote: On 14/2/23 18:18, Cédric Le Goater wrote: From: Klaus Jensen Add an example I2C device to demonstrate how a slave may master the bus and send data asynchronously to another slave. What a rebellion... The device will echo whatever it is sent to the device identified by the first byte received. Signed-off-by: Klaus Jensen [ clg: - Changed to build to use CONFIG_ASPEED_SOC since only supported on such SoCs - folded in these fixes : https://lore.kernel.org/qemu-devel/Y3yMKAhOkYGtnkOp@cormorant.local/ ] Message-Id: <20220601210831.67259-7-...@irrelevant.dk> Signed-off-by: Cédric Le Goater --- hw/misc/i2c-echo.c | 156 hw/misc/meson.build | 2 + 2 files changed, 158 insertions(+) create mode 100644 hw/misc/i2c-echo.c diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 448e14b531..3eb1bda710 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -129,6 +129,8 @@ softmmu_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_rng.c')) softmmu_ss.add(when: 'CONFIG_GRLIB', if_true: files('grlib_ahb_apb_pnp.c')) +softmmu_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('i2c-echo.c')) s/CONFIG_ASPEED_SOC/CONFIG_I2C/ since this is a generic device. even if only supported by the Aspeed SoC ? I am OK with both. specific_ss.add(when: 'CONFIG_AVR_POWER', if_true: files('avr_power.c')) specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
Re: [PATCH 0/8] aspeed: I2C fixes, -drive removal (first step)
On 2/15/23 07:38, Markus Armbruster wrote: Cédric Le Goater writes: Hello, This series starts with a first set of patches fixing I2C slave mode in the Aspeed I2C controller, a test device and its associated test in avocado. Follow some cleanups which allow the use of block devices instead of drives. So that, instead of specifying : -drive file=./flash-ast2600-evb,format=raw,if=mtd -drive file=./ast2600-evb.pnor,format=raw,if=mtd ... and guessing from the order which bus the device is attached to, we can use : -blockdev node-name=fmc0,driver=file,filename=./bmc.img -device mx66u51235f,bus=ssi.0,drive=fmc0 -blockdev node-name=fmc1,driver=file,filename=./bmc-alt.img -device mx66u51235f,bus=ssi.0,drive=fmc1 -blockdev node-name=pnor,driver=file,filename=./pnor -device mx66l1g45g,bus=ssi.1,drive=pnor ... It is not perfect, the CS index still depends on the order, but it is now possible to run a machine without -drive ...,if=mtd. Lovely! Does this cover all uses of IF_MTD, or only some? All use for default devices in the aspeed machines. I think most machines use IF_MTD in a similar way. Yet, one extra use of IF_MTD is to fill a ROM region with the first drive contents. It avoids fetching instructions from the SPI flash mapping and speeds up boot quite significantly. Once the default flash devices are created and attached to their drive, it is possible to query the block backend without the drive_get() API. I have a couple of patches for it and it shouldn't be a problem. This lacks the final patch enabling the '-nodefaults' option by not creating the default devices if specified on the command line. It needs some more evaluation of the possible undesired effects. Are you thinking of something similar to the default CD-ROM, i.e. use default_list to have -device suppress a certain kind of default devices, and also have -nodefaults suppress them all? I would have -nodefaults suppress all flash devices. There are also I2C devices but they are less problematic for the machine definition. (Well, eeprom contents can be complex to handle) The next step would be to get rid of the drive_get(IF_MTD) internal API, which means finding a way to attach block backend devices from the command line to the default flash devices. This should be done at machine init time and the blockdev should have some 'bus@addr' identifier. I lack the knowledge on how this could be done. Thanks, C.
[PATCH 2/8] hw/i2c: only schedule pending master when bus is idle
From: Klaus Jensen It is not given that the current master will release the bus after a transfer ends. Only schedule a pending master if the bus is idle. Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters") Signed-off-by: Klaus Jensen Acked-by: Corey Minyard Message-Id: <20221116084312.35808-2-...@irrelevant.dk> Signed-off-by: Cédric Le Goater --- include/hw/i2c/i2c.h | 2 ++ hw/i2c/aspeed_i2c.c | 2 ++ hw/i2c/core.c| 37 ++--- 3 files changed, 26 insertions(+), 15 deletions(-) diff --git a/include/hw/i2c/i2c.h b/include/hw/i2c/i2c.h index 9b9581d230..2a3abacd1b 100644 --- a/include/hw/i2c/i2c.h +++ b/include/hw/i2c/i2c.h @@ -141,6 +141,8 @@ int i2c_start_send(I2CBus *bus, uint8_t address); */ int i2c_start_send_async(I2CBus *bus, uint8_t address); +void i2c_schedule_pending_master(I2CBus *bus); + void i2c_end_transfer(I2CBus *bus); void i2c_nack(I2CBus *bus); void i2c_ack(I2CBus *bus); diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index c166fd20fa..1f071a3811 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -550,6 +550,8 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value) } SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_STOP_CMD, 0); aspeed_i2c_set_state(bus, I2CD_IDLE); + +i2c_schedule_pending_master(bus->bus); } if (aspeed_i2c_bus_pkt_mode_en(bus)) { diff --git a/hw/i2c/core.c b/hw/i2c/core.c index d4ba8146bf..bed594fe59 100644 --- a/hw/i2c/core.c +++ b/hw/i2c/core.c @@ -185,22 +185,39 @@ int i2c_start_transfer(I2CBus *bus, uint8_t address, bool is_recv) void i2c_bus_master(I2CBus *bus, QEMUBH *bh) { -if (i2c_bus_busy(bus)) { -I2CPendingMaster *node = g_new(struct I2CPendingMaster, 1); -node->bh = bh; +I2CPendingMaster *node = g_new(struct I2CPendingMaster, 1); +node->bh = bh; + +QSIMPLEQ_INSERT_TAIL(>pending_masters, node, entry); +} + +void i2c_schedule_pending_master(I2CBus *bus) +{ +I2CPendingMaster *node; -QSIMPLEQ_INSERT_TAIL(>pending_masters, node, entry); +if (i2c_bus_busy(bus)) { +/* someone is already controlling the bus; wait for it to release it */ +return; +} +if (QSIMPLEQ_EMPTY(>pending_masters)) { return; } -bus->bh = bh; +node = QSIMPLEQ_FIRST(>pending_masters); +bus->bh = node->bh; + +QSIMPLEQ_REMOVE_HEAD(>pending_masters, entry); +g_free(node); + qemu_bh_schedule(bus->bh); } void i2c_bus_release(I2CBus *bus) { bus->bh = NULL; + +i2c_schedule_pending_master(bus); } int i2c_start_recv(I2CBus *bus, uint8_t address) @@ -234,16 +251,6 @@ void i2c_end_transfer(I2CBus *bus) g_free(node); } bus->broadcast = false; - -if (!QSIMPLEQ_EMPTY(>pending_masters)) { -I2CPendingMaster *node = QSIMPLEQ_FIRST(>pending_masters); -bus->bh = node->bh; - -QSIMPLEQ_REMOVE_HEAD(>pending_masters, entry); -g_free(node); - -qemu_bh_schedule(bus->bh); -} } int i2c_send(I2CBus *bus, uint8_t data) -- 2.39.1
[PATCH 7/8] aspeed: Introduce a spi_boot region under the SoC
The default boot of the Aspeed SoCs is address 0x0. For this reason, the FMC flash device contents are remapped by HW on the first 256MB of the address space. In QEMU, this is currently done in the machine init with the setup of a region alias. Move this code to the SoC and introduce an extra container to prepare ground for the boot ROM region which will overlap the FMC flash remapping. Signed-off-by: Cédric Le Goater --- include/hw/arm/aspeed_soc.h | 3 +++ hw/arm/aspeed.c | 13 + hw/arm/aspeed_ast2600.c | 13 + hw/arm/aspeed_soc.c | 14 ++ hw/arm/fby35.c | 8 +--- 5 files changed, 32 insertions(+), 19 deletions(-) diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index bd1e03e78a..db55505d14 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -58,6 +58,8 @@ struct AspeedSoCState { MemoryRegion *dram_mr; MemoryRegion dram_container; MemoryRegion sram; +MemoryRegion spi_boot_container; +MemoryRegion spi_boot; AspeedVICState vic; AspeedRtcState rtc; AspeedTimerCtrlState timerctrl; @@ -120,6 +122,7 @@ struct AspeedSoCClass { enum { +ASPEED_DEV_SPI_BOOT, ASPEED_DEV_IOMEM, ASPEED_DEV_UART1, ASPEED_DEV_UART2, diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 21184f3ad4..998dc57969 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -384,18 +384,7 @@ static void aspeed_machine_init(MachineState *machine) MemoryRegion *boot_rom = g_new(MemoryRegion, 1); uint64_t size = memory_region_size(>mmio); -/* - * create a ROM region using the default mapping window size of - * the flash module. The window size is 64MB for the AST2400 - * SoC and 128MB for the AST2500 SoC, which is twice as big as - * needed by the flash modules of the Aspeed machines. - */ -if (ASPEED_MACHINE(machine)->mmio_exec) { -memory_region_init_alias(boot_rom, NULL, "aspeed.boot_rom", - >mmio, 0, size); -memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR, -boot_rom); -} else { +if (!ASPEED_MACHINE(machine)->mmio_exec) { memory_region_init_rom(boot_rom, NULL, "aspeed.boot_rom", size, _abort); memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR, diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c index bb2769df04..20f0b772d6 100644 --- a/hw/arm/aspeed_ast2600.c +++ b/hw/arm/aspeed_ast2600.c @@ -21,6 +21,7 @@ #define ASPEED_SOC_DPMCU_SIZE 0x0004 static const hwaddr aspeed_soc_ast2600_memmap[] = { +[ASPEED_DEV_SPI_BOOT] = 0x0, [ASPEED_DEV_SRAM] = 0x1000, [ASPEED_DEV_DPMCU] = 0x1800, /* 0x1600 0x17FF : AHB BUS do LPC Bus bridge */ @@ -282,6 +283,12 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) qemu_irq irq; g_autofree char *sram_name = NULL; +/* Default boot region (SPI memory or ROMs) */ +memory_region_init(>spi_boot_container, OBJECT(s), + "aspeed.spi_boot_container", 0x1000); +memory_region_add_subregion(s->memory, sc->memmap[ASPEED_DEV_SPI_BOOT], +>spi_boot_container); + /* IO space */ aspeed_mmio_map_unimplemented(s, SYS_BUS_DEVICE(>iomem), "aspeed.io", sc->memmap[ASPEED_DEV_IOMEM], @@ -431,6 +438,12 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) sysbus_connect_irq(SYS_BUS_DEVICE(>fmc), 0, aspeed_soc_get_irq(s, ASPEED_DEV_FMC)); +/* Set up an alias on the FMC CE0 region (boot default) */ +MemoryRegion *fmc0_mmio = >fmc.flashes[0].mmio; +memory_region_init_alias(>spi_boot, OBJECT(s), "aspeed.spi_boot", + fmc0_mmio, 0, memory_region_size(fmc0_mmio)); +memory_region_add_subregion(>spi_boot_container, 0x0, >spi_boot); + /* SPI */ for (i = 0; i < sc->spis_num; i++) { object_property_set_link(OBJECT(>spi[i]), "dram", diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index e884d6badc..3507ea5818 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -25,6 +25,7 @@ #define ASPEED_SOC_IOMEM_SIZE 0x0020 static const hwaddr aspeed_soc_ast2400_memmap[] = { +[ASPEED_DEV_SPI_BOOT] = 0x0, [ASPEED_DEV_IOMEM] = 0x1E60, [ASPEED_DEV_FMC]= 0x1E62, [ASPEED_DEV_SPI1] = 0x1E63, @@ -59,6 +60,7 @@ static const hwaddr aspeed_soc_ast2400_memmap[] = { }; static const hwaddr aspeed_soc_ast2500_memmap[] = { +[ASPEED_DEV_SPI_BOOT] = 0x0, [ASPEED_DEV_IOMEM] = 0x1E6000
[PATCH 8/8] aspeed: Add a boot_rom overlap region in the SoC spi_boot container
To avoid the SPI transactions fetching instructions from the FMC CE0 flash device and speed up boot, a ROM can be created if a drive is available. Reverse a bit the logic to allow a machine to boot without a drive, using a block device instead : -blockdev node-name=fmc0,driver=file,filename=/path/to/flash.img \ -device mx66u51235f,bus=ssi.0,drive=fmc0 Signed-off-by: Cédric Le Goater --- hw/arm/aspeed.c | 45 +++-- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 998dc57969..13e719bae7 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -243,10 +243,9 @@ static void aspeed_reset_secondary(ARMCPU *cpu, #define FIRMWARE_ADDR 0x0 -static void write_boot_rom(DriveInfo *dinfo, hwaddr addr, size_t rom_size, +static void write_boot_rom(BlockBackend *blk, hwaddr addr, size_t rom_size, Error **errp) { -BlockBackend *blk = blk_by_legacy_dinfo(dinfo); g_autofree void *storage = NULL; int64_t size; @@ -272,6 +271,22 @@ static void write_boot_rom(DriveInfo *dinfo, hwaddr addr, size_t rom_size, rom_add_blob_fixed("aspeed.boot_rom", storage, rom_size, addr); } +/* + * Create a ROM and copy the flash contents at the expected address + * (0x0). Boots faster than execute-in-place. + */ +static void aspeed_install_boot_rom(AspeedSoCState *soc, BlockBackend *blk, +uint64_t rom_size) +{ +MemoryRegion *boot_rom = g_new(MemoryRegion, 1); + +memory_region_init_rom(boot_rom, NULL, "aspeed.boot_rom", rom_size, + _abort); +memory_region_add_subregion_overlap(>spi_boot_container, 0, +boot_rom, 1); +write_boot_rom(blk, FIRMWARE_ADDR, rom_size, _abort); +} + void aspeed_board_init_flashes(AspeedSMCState *s, const char *flashtype, unsigned int count, int unit0) { @@ -328,7 +343,6 @@ static void aspeed_machine_init(MachineState *machine) AspeedMachineState *bmc = ASPEED_MACHINE(machine); AspeedMachineClass *amc = ASPEED_MACHINE_GET_CLASS(machine); AspeedSoCClass *sc; -DriveInfo *drive0 = drive_get(IF_MTD, 0, 0); int i; NICInfo *nd = _table[0]; @@ -378,21 +392,6 @@ static void aspeed_machine_init(MachineState *machine) bmc->spi_model ? bmc->spi_model : amc->spi_model, 1, amc->num_cs); -/* Install first FMC flash content as a boot rom. */ -if (drive0) { -AspeedSMCFlash *fl = >soc.fmc.flashes[0]; -MemoryRegion *boot_rom = g_new(MemoryRegion, 1); -uint64_t size = memory_region_size(>mmio); - -if (!ASPEED_MACHINE(machine)->mmio_exec) { -memory_region_init_rom(boot_rom, NULL, "aspeed.boot_rom", - size, _abort); -memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR, -boot_rom); -write_boot_rom(drive0, FIRMWARE_ADDR, size, _abort); -} -} - if (machine->kernel_filename && sc->num_cpus > 1) { /* With no u-boot we must set up a boot stub for the secondary CPU */ MemoryRegion *smpboot = g_new(MemoryRegion, 1); @@ -423,6 +422,16 @@ static void aspeed_machine_init(MachineState *machine) drive_get(IF_SD, 0, bmc->soc.sdhci.num_slots)); } +if (!bmc->mmio_exec) { +DriveInfo *mtd0 = drive_get(IF_MTD, 0, 0); + +if (mtd0) { +uint64_t rom_size = memory_region_size(>soc.spi_boot); +aspeed_install_boot_rom(>soc, blk_by_legacy_dinfo(mtd0), +rom_size); +} +} + arm_load_kernel(ARM_CPU(first_cpu), machine, _board_binfo); } -- 2.39.1
[PATCH 3/8] hw/misc: add a toy i2c echo device
From: Klaus Jensen Add an example I2C device to demonstrate how a slave may master the bus and send data asynchronously to another slave. The device will echo whatever it is sent to the device identified by the first byte received. Signed-off-by: Klaus Jensen [ clg: - Changed to build to use CONFIG_ASPEED_SOC since only supported on such SoCs - folded in these fixes : https://lore.kernel.org/qemu-devel/Y3yMKAhOkYGtnkOp@cormorant.local/ ] Message-Id: <20220601210831.67259-7-...@irrelevant.dk> Signed-off-by: Cédric Le Goater --- hw/misc/i2c-echo.c | 156 hw/misc/meson.build | 2 + 2 files changed, 158 insertions(+) create mode 100644 hw/misc/i2c-echo.c diff --git a/hw/misc/i2c-echo.c b/hw/misc/i2c-echo.c new file mode 100644 index 00..5705ab5d73 --- /dev/null +++ b/hw/misc/i2c-echo.c @@ -0,0 +1,156 @@ +#include "qemu/osdep.h" +#include "qemu/timer.h" +#include "qemu/main-loop.h" +#include "block/aio.h" +#include "hw/i2c/i2c.h" + +#define TYPE_I2C_ECHO "i2c-echo" +OBJECT_DECLARE_SIMPLE_TYPE(I2CEchoState, I2C_ECHO) + +enum i2c_echo_state { +I2C_ECHO_STATE_IDLE, +I2C_ECHO_STATE_START_SEND, +I2C_ECHO_STATE_ACK, +}; + +typedef struct I2CEchoState { +I2CSlave parent_obj; + +I2CBus *bus; + +enum i2c_echo_state state; +QEMUBH *bh; + +unsigned int pos; +uint8_t data[3]; +} I2CEchoState; + +static void i2c_echo_bh(void *opaque) +{ +I2CEchoState *state = opaque; + +switch (state->state) { +case I2C_ECHO_STATE_IDLE: +return; + +case I2C_ECHO_STATE_START_SEND: +if (i2c_start_send_async(state->bus, state->data[0])) { +goto release_bus; +} + +state->pos++; +state->state = I2C_ECHO_STATE_ACK; +return; + +case I2C_ECHO_STATE_ACK: +if (state->pos > 2) { +break; +} + +if (i2c_send_async(state->bus, state->data[state->pos++])) { +break; +} + +return; +} + + +i2c_end_transfer(state->bus); +release_bus: +i2c_bus_release(state->bus); + +state->state = I2C_ECHO_STATE_IDLE; +} + +static int i2c_echo_event(I2CSlave *s, enum i2c_event event) +{ +I2CEchoState *state = I2C_ECHO(s); + +switch (event) { +case I2C_START_RECV: +state->pos = 0; + +break; + +case I2C_START_SEND: +state->pos = 0; + +break; + +case I2C_FINISH: +state->pos = 0; +state->state = I2C_ECHO_STATE_START_SEND; +i2c_bus_master(state->bus, state->bh); + +break; + +case I2C_NACK: +break; + +default: +return -1; +} + +return 0; +} + +static uint8_t i2c_echo_recv(I2CSlave *s) +{ +I2CEchoState *state = I2C_ECHO(s); + +if (state->pos > 2) { +return 0xff; +} + +return state->data[state->pos++]; +} + +static int i2c_echo_send(I2CSlave *s, uint8_t data) +{ +I2CEchoState *state = I2C_ECHO(s); + +if (state->pos > 2) { +return -1; +} + +state->data[state->pos++] = data; + +return 0; +} + +static void i2c_echo_realize(DeviceState *dev, Error **errp) +{ +I2CEchoState *state = I2C_ECHO(dev); +BusState *bus = qdev_get_parent_bus(dev); + +state->bus = I2C_BUS(bus); +state->bh = qemu_bh_new(i2c_echo_bh, state); + +return; +} + +static void i2c_echo_class_init(ObjectClass *oc, void *data) +{ +I2CSlaveClass *sc = I2C_SLAVE_CLASS(oc); +DeviceClass *dc = DEVICE_CLASS(oc); + +dc->realize = i2c_echo_realize; + +sc->event = i2c_echo_event; +sc->recv = i2c_echo_recv; +sc->send = i2c_echo_send; +} + +static const TypeInfo i2c_echo = { +.name = TYPE_I2C_ECHO, +.parent = TYPE_I2C_SLAVE, +.instance_size = sizeof(I2CEchoState), +.class_init = i2c_echo_class_init, +}; + +static void register_types(void) +{ +type_register_static(_echo); +} + +type_init(register_types); diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 448e14b531..3eb1bda710 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -129,6 +129,8 @@ softmmu_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_rng.c')) softmmu_ss.add(when: 'CONFIG_GRLIB', if_true: files('grlib_ahb_apb_pnp.c')) +softmmu_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('i2c-echo.c')) + specific_ss.add(when: 'CONFIG_AVR_POWER', if_true: files('avr_power.c')) specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c')) -- 2.39.1
[PATCH 1/8] m25p80: Improve error when the backend file size does not match the device
Currently, when a block backend is attached to a m25p80 device and the associated file size does not match the flash model, QEMU complains with the error message "failed to read the initial flash content". This is confusing for the user. Use blk_check_size_and_read_all() instead of blk_pread() to improve the reported error. Reviewed-by: Peter Maydell Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Delevoryas Reviewed-by: Alistair Francis Message-Id: <20221115151000.2080833-1-...@kaod.org> Signed-off-by: Cédric Le Goater --- breakage with commit a4b15a8b9e ("pflash: Only read non-zero parts of backend image") when using -snaphot. hw/block/m25p80.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 802d2eb021..dc5ffbc4ff 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -24,6 +24,7 @@ #include "qemu/osdep.h" #include "qemu/units.h" #include "sysemu/block-backend.h" +#include "hw/block/block.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" #include "hw/ssi/ssi.h" @@ -1615,8 +1616,7 @@ static void m25p80_realize(SSIPeripheral *ss, Error **errp) trace_m25p80_binding(s); s->storage = blk_blockalign(s->blk, s->size); -if (blk_pread(s->blk, 0, s->size, s->storage, 0) < 0) { -error_setg(errp, "failed to read the initial flash content"); +if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) { return; } } else { -- 2.39.1
[PATCH 6/8] aspeed/smc: Wire CS lines at reset
It has become difficult to define on the command line the flash devices of the Aspeed machines and their file backend. Currently, a set of default flash devices is created at machine init and drives are associated to the FMC and SPI controller devices in sequence : -drive file,format=raw,if=mtd -drive file,format=raw,if=mtd ... The CS lines are wired in the same creation loop. On real systems, these flash devices are sometime soldered to the board but the models can be different or a socket is provided to replace the flash device. So, it is legitimate to not consider them as always available by default. Some machine options were provided to specify different models, but this has its limits and the best approach would be to allow the use of block devices, such as : -blockdev node-name=fmc0,driver=file,filename=./flash.img \ -device mx66u51235f,bus=ssi.0,drive=fmc0 \ The first step in that direction is to wire the CS lines of all available devices on a bus at reset time. Let's do that and check the maximum number of devices supported by the bus while at it. The bus parent can now be explicitly defined but the device order still depends on the command line definitions. Signed-off-by: Cédric Le Goater --- hw/arm/aspeed.c | 4 hw/ssi/aspeed_smc.c | 24 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 7c28546d7f..21184f3ad4 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -283,7 +283,6 @@ void aspeed_board_init_flashes(AspeedSMCState *s, const char *flashtype, for (i = 0; i < count; ++i) { DriveInfo *dinfo = drive_get(IF_MTD, 0, unit0 + i); -qemu_irq cs_line; DeviceState *dev; dev = qdev_new(flashtype); @@ -291,9 +290,6 @@ void aspeed_board_init_flashes(AspeedSMCState *s, const char *flashtype, qdev_prop_set_drive(dev, "drive", blk_by_legacy_dinfo(dinfo)); } qdev_realize_and_unref(dev, BUS(s->spi), _fatal); - -cs_line = qdev_get_gpio_in_named(dev, SSI_GPIO_CS, 0); -qdev_connect_gpio_out_named(DEVICE(s), "cs", i, cs_line); } } diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index 7281169322..412cf125d9 100644 --- a/hw/ssi/aspeed_smc.c +++ b/hw/ssi/aspeed_smc.c @@ -680,6 +680,28 @@ static void aspeed_smc_flash_update_ctrl(AspeedSMCFlash *fl, uint32_t value) aspeed_smc_flash_do_select(fl, unselect); } +/* + * TODO: assumption is made on the order of creation of devices, the + * ones on the command line or the default devices created at machine + * init. + */ +static void aspeed_smc_wire_cs_lines(AspeedSMCState *s, int cs_max) +{ +BusState *b = BUS(s->spi); +BusChild *kid; + +QTAILQ_FOREACH(kid, >children, sibling) { +qemu_irq cs_line = qdev_get_gpio_in_named(kid->child, SSI_GPIO_CS, 0); +if (kid->index < cs_max) { +qdev_connect_gpio_out_named(DEVICE(s), "cs", kid->index, cs_line); +} else { +warn_report("Too many devices for SSI bus %s", +object_class_get_name(object_get_class(OBJECT(s; +return; +} +} +} + static void aspeed_smc_reset(DeviceState *d) { AspeedSMCState *s = ASPEED_SMC(d); @@ -692,6 +714,8 @@ static void aspeed_smc_reset(DeviceState *d) memset(s->regs, 0, sizeof s->regs); } +aspeed_smc_wire_cs_lines(s, asc->cs_num_max); + /* Unselect all peripherals */ for (i = 0; i < asc->cs_num_max; ++i) { s->regs[s->r_ctrl0 + i] |= CTRL_CE_STOP_ACTIVE; -- 2.39.1
[PATCH 5/8] aspeed/smc: Replace SysBus IRQs with GPIO lines
It's cleaner and removes the curious '+ 1' required to skip the DMA IRQ line of the controller. Signed-off-by: Cédric Le Goater --- hw/arm/aspeed.c | 2 +- hw/ssi/aspeed_smc.c | 5 + 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 27dda58338..7c28546d7f 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -293,7 +293,7 @@ void aspeed_board_init_flashes(AspeedSMCState *s, const char *flashtype, qdev_realize_and_unref(dev, BUS(s->spi), _fatal); cs_line = qdev_get_gpio_in_named(dev, SSI_GPIO_CS, 0); -sysbus_connect_irq(SYS_BUS_DEVICE(s), i + 1, cs_line); +qdev_connect_gpio_out_named(DEVICE(s), "cs", i, cs_line); } } diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index 22df4be528..7281169322 100644 --- a/hw/ssi/aspeed_smc.c +++ b/hw/ssi/aspeed_smc.c @@ -1134,10 +1134,7 @@ static void aspeed_smc_realize(DeviceState *dev, Error **errp) /* Setup cs_lines for peripherals */ s->cs_lines = g_new0(qemu_irq, asc->cs_num_max); - -for (i = 0; i < asc->cs_num_max; ++i) { -sysbus_init_irq(sbd, >cs_lines[i]); -} +qdev_init_gpio_out_named(DEVICE(s), s->cs_lines, "cs", asc->cs_num_max); /* The memory region for the controller registers */ memory_region_init_io(>mmio, OBJECT(s), _smc_ops, s, -- 2.39.1
[PATCH 4/8] tests/avocado/machine_aspeed.py: Add I2C slave tests
Test extracted from : https://lists.nongnu.org/archive/html/qemu-devel/2022-06/msg00183.html Signed-off-by: Cédric Le Goater --- tests/avocado/machine_aspeed.py | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tests/avocado/machine_aspeed.py b/tests/avocado/machine_aspeed.py index ddf05b3617..d2c57ccb7e 100644 --- a/tests/avocado/machine_aspeed.py +++ b/tests/avocado/machine_aspeed.py @@ -199,6 +199,8 @@ def test_arm_ast2600_evb_buildroot(self): 'tmp105,bus=aspeed.i2c.bus.3,address=0x4d,id=tmp-test'); self.vm.add_args('-device', 'ds1338,bus=aspeed.i2c.bus.3,address=0x32'); +self.vm.add_args('-device', + 'i2c-echo,bus=aspeed.i2c.bus.3,address=0x42'); self.do_test_arm_aspeed_buildroot_start(image_path, '0xf00') exec_command_and_wait_for_pattern(self, @@ -217,6 +219,14 @@ def test_arm_ast2600_evb_buildroot(self): year = time.strftime("%Y") exec_command_and_wait_for_pattern(self, 'hwclock -f /dev/rtc1', year); +exec_command_and_wait_for_pattern(self, + 'echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device', + 'i2c i2c-3: new_device: Instantiated device slave-24c02 at 0x64'); +exec_command(self, 'i2cset -y 3 0x42 0x64 0x00 0xaa i'); +time.sleep(0.1) +exec_command_and_wait_for_pattern(self, + 'hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom', + '000 ffaa '); self.do_test_arm_aspeed_buildroot_poweroff() -- 2.39.1
[PATCH 0/8] aspeed: I2C fixes, -drive removal (first step)
Hello, This series starts with a first set of patches fixing I2C slave mode in the Aspeed I2C controller, a test device and its associated test in avocado. Follow some cleanups which allow the use of block devices instead of drives. So that, instead of specifying : -drive file=./flash-ast2600-evb,format=raw,if=mtd -drive file=./ast2600-evb.pnor,format=raw,if=mtd ... and guessing from the order which bus the device is attached to, we can use : -blockdev node-name=fmc0,driver=file,filename=./bmc.img -device mx66u51235f,bus=ssi.0,drive=fmc0 -blockdev node-name=fmc1,driver=file,filename=./bmc-alt.img -device mx66u51235f,bus=ssi.0,drive=fmc1 -blockdev node-name=pnor,driver=file,filename=./pnor -device mx66l1g45g,bus=ssi.1,drive=pnor ... It is not perfect, the CS index still depends on the order, but it is now possible to run a machine without -drive ...,if=mtd. This lacks the final patch enabling the '-nodefaults' option by not creating the default devices if specified on the command line. It needs some more evaluation of the possible undesired effects. Thanks, C. Cédric Le Goater (6): m25p80: Improve error when the backend file size does not match the device tests/avocado/machine_aspeed.py: Add I2C slave tests aspeed/smc: Replace SysBus IRQs with GPIO lines aspeed/smc: Wire CS lines at reset aspeed: Introduce a spi_boot region under the SoC aspeed: Add a boot_rom overlap region in the SoC spi_boot container Klaus Jensen (2): hw/i2c: only schedule pending master when bus is idle hw/misc: add a toy i2c echo device include/hw/arm/aspeed_soc.h | 3 + include/hw/i2c/i2c.h| 2 + hw/arm/aspeed.c | 60 ++-- hw/arm/aspeed_ast2600.c | 13 +++ hw/arm/aspeed_soc.c | 14 +++ hw/arm/fby35.c | 8 +- hw/block/m25p80.c | 4 +- hw/i2c/aspeed_i2c.c | 2 + hw/i2c/core.c | 37 +--- hw/misc/i2c-echo.c | 156 hw/ssi/aspeed_smc.c | 29 +- hw/misc/meson.build | 2 + tests/avocado/machine_aspeed.py | 10 ++ 13 files changed, 279 insertions(+), 61 deletions(-) create mode 100644 hw/misc/i2c-echo.c -- 2.39.1
Re: [PATCH v2 6/9] hw/ppc: Replace dev->parent_bus by qdev_get_parent_bus(dev)
On 2/13/23 08:04, Philippe Mathieu-Daudé wrote: DeviceState::parent_bus is an internal field and should be accessed by the qdev_get_parent_bus() helper. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Thanks, C. --- hw/ppc/spapr_vio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c index 9d4fec2c04..dfc5c436bd 100644 --- a/hw/ppc/spapr_vio.c +++ b/hw/ppc/spapr_vio.c @@ -382,7 +382,7 @@ static void rtas_quiesce(PowerPCCPU *cpu, SpaprMachineState *spapr, static SpaprVioDevice *reg_conflict(SpaprVioDevice *dev) { -SpaprVioBus *bus = SPAPR_VIO_BUS(dev->qdev.parent_bus); +SpaprVioBus *bus = SPAPR_VIO_BUS(qdev_get_parent_bus(DEVICE(dev))); BusChild *kid; SpaprVioDevice *other; @@ -492,7 +492,7 @@ static void spapr_vio_busdev_realize(DeviceState *qdev, Error **errp) } } else { /* Need to assign an address */ -SpaprVioBus *bus = SPAPR_VIO_BUS(dev->qdev.parent_bus); +SpaprVioBus *bus = SPAPR_VIO_BUS(qdev_get_parent_bus(DEVICE(dev))); do { dev->reg = bus->next_reg++;
Re: [PATCH v9 14/14] docs/devel: Align VFIO migration docs to v2 protocol
On 2/6/23 13:31, Avihai Horon wrote: Now that VFIO migration protocol v2 has been implemented and v1 protocol has been removed, update the documentation according to v2 protocol. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater You will need a rebase for next version. The migration PR was merged since. Thanks, C. --- docs/devel/vfio-migration.rst | 68 --- 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 9ff6163c88..1d50c2fe5f 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,46 +7,39 @@ the guest is running on source host and restoring this saved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. -Migration of VFIO devices consists of two phases: the optional pre-copy phase, -and the stop-and-copy phase. The pre-copy phase is iterative and allows to -accommodate VFIO devices that have a large amount of data that needs to be -transferred. The iterative pre-copy phase of migration allows for the guest to -continue whilst the VFIO device state is transferred to the destination, this -helps to reduce the total downtime of the VM. VFIO devices can choose to skip -the pre-copy phase of migration by returning pending_bytes as zero during the -pre-copy phase. +Migration of VFIO devices currently consists of a single stop-and-copy phase. +During the stop-and-copy phase the guest is stopped and the entire VFIO device +data is transferred to the destination. + +The pre-copy phase of migration is currently not supported for VFIO devices. +Support for VFIO pre-copy will be added later on. A detailed description of the UAPI for VFIO device migration can be found in -the comment for the ``vfio_device_migration_info`` structure in the header -file linux-headers/linux/vfio.h. +the comment for the ``vfio_device_mig_state`` structure in the header file +linux-headers/linux/vfio.h. VFIO implements the device hooks for the iterative approach as follows: -* A ``save_setup`` function that sets up the migration region and sets _SAVING - flag in the VFIO device state. +* A ``save_setup`` function that sets up migration on the source. -* A ``load_setup`` function that sets up the migration region on the - destination and sets _RESUMING flag in the VFIO device state. +* A ``load_setup`` function that sets the VFIO device on the destination in + _RESUMING state. * A ``save_live_pending`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has yet to save for the VFIO device. -* A ``save_live_iterate`` function that reads the VFIO device's data from the - vendor driver through the migration region during iterative phase. - * A ``save_state`` function to save the device config space if it is present. -* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the - VFIO device state and iteratively copies the remaining data for the VFIO - device until the vendor driver indicates that no data remains (pending bytes - is zero). +* A ``save_live_complete_precopy`` function that sets the VFIO device in + _STOP_COPY state and iteratively copies the data for the VFIO device until + the vendor driver indicates that no data remains. * A ``load_state`` function that loads the config section and the data - sections that are generated by the save functions above + sections that are generated by the save functions above. * ``cleanup`` functions for both save and load that perform any migration - related cleanup, including unmapping the migration region + related cleanup. The VFIO migration code uses a VM state change handler to change the VFIO @@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinned by the vendor driver can also be written by the device. There is currently no device or IOMMU support for dirty page tracking in hardware. -By default, dirty pages are tracked when the device is in pre-copy as well as -stop-and-copy phase. So, a page pinned by the vendor driver will be copied to -the destination in both phases. Copying dirty pages in pre-copy phase helps -QEMU to predict if it can achieve its downtime tolerances. If QEMU during -pre-copy phase keeps finding dirty pages continuously, then it understands -that even in stop-and-copy phase, it is likely to find dirty pages and can -predict the downtime accordingly. +By default, dirty pages are tracked during pre-copy as well as stop-and-copy +phase. So, a page pinned by the vendor driver will be copied to the destination +in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if +it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps +finding dirty pages continuously, then it understands that even in stop-and-copy +phase
Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration
On 2/8/23 14:08, Avihai Horon wrote: On 08/02/2023 0:34, Alex Williamson wrote: External email: Use caution opening links or attachments On Mon, 6 Feb 2023 14:31:30 +0200 Avihai Horon wrote: Currently VFIO migration doesn't implement some kind of intermediate quiescent state in which P2P DMAs are quiesced before stopping or running the device. This can cause problems in multi-device migration where the devices are doing P2P DMAs, since the devices are not stopped together at the same time. Until such support is added, block migration of multiple devices. Signed-off-by: Avihai Horon --- include/hw/vfio/vfio-common.h | 2 ++ hw/vfio/common.c | 51 +++ hw/vfio/migration.c | 6 + 3 files changed, 59 insertions(+) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index e573f5a9f1..56b1683824 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList; extern VFIOGroupList vfio_group_list; bool vfio_mig_active(void); +int vfio_block_multiple_devices_migration(Error **errp); +void vfio_unblock_multiple_devices_migration(void); int64_t vfio_mig_bytes_transferred(void); #ifdef CONFIG_LINUX diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 3a35f4afad..01db41b735 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -41,6 +41,7 @@ #include "qapi/error.h" #include "migration/migration.h" #include "migration/misc.h" +#include "migration/blocker.h" #include "sysemu/tpm.h" VFIOGroupList vfio_group_list = @@ -337,6 +338,56 @@ bool vfio_mig_active(void) return true; } +Error *multiple_devices_migration_blocker; static ? So we have two migration blockers, one per device and one global. I guess it is easier that way than trying to update the per device Error*. C. + +static unsigned int vfio_migratable_device_num(void) +{ + VFIOGroup *group; + VFIODevice *vbasedev; + unsigned int device_num = 0; + + QLIST_FOREACH(group, _group_list, next) { + QLIST_FOREACH(vbasedev, >device_list, next) { + if (vbasedev->migration) { + device_num++; + } + } + } + + return device_num; +} + +int vfio_block_multiple_devices_migration(Error **errp) +{ + int ret; + + if (vfio_migratable_device_num() != 2) { + return 0; + } + + error_setg(_devices_migration_blocker, + "Migration is currently not supported with multiple " + "VFIO devices"); + ret = migrate_add_blocker(multiple_devices_migration_blocker, errp); + if (ret < 0) { + error_free(multiple_devices_migration_blocker); + multiple_devices_migration_blocker = NULL; + } + + return ret; +} + +void vfio_unblock_multiple_devices_migration(void) +{ + if (vfio_migratable_device_num() != 2) { + return; + } + + migrate_del_blocker(multiple_devices_migration_blocker); + error_free(multiple_devices_migration_blocker); + multiple_devices_migration_blocker = NULL; +} A couple awkward things here. First I wish we could do something cleaner or more intuitive than the != 2 test. I get that we're trying to do this on the addition of the 2nd device supporting migration, or the removal of the next to last device independent of all other devices, but I wonder if it wouldn't be better to remove the multiple-device blocker after migration is torn down for the device so we can test device >1 or ==1 in combination with whether multiple_devices_migration_blocker is NULL. Which comes to the second awkwardness, if we fail to add the blocker we free and clear the blocker, but when we tear down the device due to that failure we'll remove the blocker that doesn't exist, free NULL, and clear it again. Thanks to the glib slist the migration blocker is using, I think that all works, but I'd rather not be dependent on that implementation to avoid a segfault here. Incorporating a test of multiple_devices_migration_blocker as above would avoid this too. You mean something like this? diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 3a35f4afad..f3e08eff58 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c [...] +int vfio_block_multiple_devices_migration(Error **errp) +{ + int ret; + + if (vfio_migratable_device_num() <= 1 || + multiple_devices_migration_blocker) { + return 0; + } + + error_setg(_devices_migration_blocker, + "Migration is currently not supported with multiple " + "VFIO devices"); + ret = migrate_add_blocker(multiple_devices_migration_blocker, errp); + if (ret < 0) { + error_free(multiple_devices_migration_blocker); + multiple_devices_migration_blocker = NULL; + } + + return ret; +} + +void vfio_unblock_multiple_devices_migration(void) +{ + if (vfio_migratable_device_num() > 1 || +
Re: [PULL 03/38] pflash: Only read non-zero parts of backend image
On 2/7/23 13:48, Kevin Wolf wrote: Am 07.02.2023 um 10:19 hat Cédric Le Goater geschrieben: On 2/7/23 09:38, Kevin Wolf wrote: Am 06.02.2023 um 16:54 hat Cédric Le Goater geschrieben: On 1/20/23 13:25, Kevin Wolf wrote: From: Xiang Zheng Currently we fill the VIRT_FLASH memory space with two 64MB NOR images when using persistent UEFI variables on virt board. Actually we only use a very small(non-zero) part of the memory while the rest significant large(zero) part of memory is wasted. So this patch checks the block status and only writes the non-zero part into memory. This requires pflash devices to use sparse files for backends. Signed-off-by: Xiang Zheng [ kraxel: rebased to latest master ] Signed-off-by: Gerd Hoffmann Message-Id: <20221220084246.1984871-1-kra...@redhat.com> Reviewed-by: Daniel P. Berrangé Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf This newly merged patch introduces a "regression" when booting an Aspeed machine. The following extra m25p80 patch (not yet merged) is required for the issue to show: https://lore.kernel.org/qemu-devel/20221115151000.2080833-1-...@kaod.org/ U-Boot fails to find the filesystem in that case. It can be easily reproduced with the witherspoon-bmc machine and seems to be related to the use of a UBI filesystem. Other Aspeed machines not using UBI are not impacted. Here is a tentative fix. I don't know enough the block layer to explain what is happening :/ I was puzzled for a moment, but... @@ -39,7 +39,7 @@ static int blk_pread_nonzeroes(BlockBack return ret; } if (!(ret & BDRV_BLOCK_ZERO)) { -ret = bdrv_pread(bs->file, offset, bytes, 'bs->file' rather than 'bs' really looks wrong. I think replacing that would already fix the bug you're seeing. Just to be sure, how did you configure the block backend? bs->file would happen to work more or less with raw over file-posix (which is probably what Gerd tested), but I think it breaks with anything else. The command is : $ qemu-system-arm -M witherspoon-bmc -net user \ -drive file=/path/to/file.mtd,format=raw,if=mtd \ -nographic -serial mon:stdio -snapshot If I remove '-snapshot', all works fine. Ok, that makes sense then. -snapshot creates a temporary qcow2 overlay, and then what your guest sees with bs->file is not the virtual disk content of the qcow2 image, but the qcow2 file itself. yes. Same symptom with pflash devices, TCG and KVM. The guest hangs with -snapshot. C. qemu-system-aarch64 -M virt -smp 2 -cpu max -accel tcg,thread=multi -nographic -m 2G -drive if=pflash,format=raw,file=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,readonly=on -drive if=pflash,format=raw,file=rhel9-varstore.img -device virtio-net,netdev=net0,bus=pcie.0,addr=0x3 -netdev user,id=net0 -drive file=rhel9-arm64.qcow2,if=none,id=disk,format=qcow2,cache=none -device virtio-blk-device,drive=disk -serial mon:stdio -snapshot
Re: [PULL 03/38] pflash: Only read non-zero parts of backend image
On 2/7/23 09:38, Kevin Wolf wrote: Am 06.02.2023 um 16:54 hat Cédric Le Goater geschrieben: On 1/20/23 13:25, Kevin Wolf wrote: From: Xiang Zheng Currently we fill the VIRT_FLASH memory space with two 64MB NOR images when using persistent UEFI variables on virt board. Actually we only use a very small(non-zero) part of the memory while the rest significant large(zero) part of memory is wasted. So this patch checks the block status and only writes the non-zero part into memory. This requires pflash devices to use sparse files for backends. Signed-off-by: Xiang Zheng [ kraxel: rebased to latest master ] Signed-off-by: Gerd Hoffmann Message-Id: <20221220084246.1984871-1-kra...@redhat.com> Reviewed-by: Daniel P. Berrangé Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf This newly merged patch introduces a "regression" when booting an Aspeed machine. The following extra m25p80 patch (not yet merged) is required for the issue to show: https://lore.kernel.org/qemu-devel/20221115151000.2080833-1-...@kaod.org/ U-Boot fails to find the filesystem in that case. It can be easily reproduced with the witherspoon-bmc machine and seems to be related to the use of a UBI filesystem. Other Aspeed machines not using UBI are not impacted. Here is a tentative fix. I don't know enough the block layer to explain what is happening :/ I was puzzled for a moment, but... @@ -39,7 +39,7 @@ static int blk_pread_nonzeroes(BlockBack return ret; } if (!(ret & BDRV_BLOCK_ZERO)) { -ret = bdrv_pread(bs->file, offset, bytes, 'bs->file' rather than 'bs' really looks wrong. I think replacing that would already fix the bug you're seeing. Just to be sure, how did you configure the block backend? bs->file would happen to work more or less with raw over file-posix (which is probably what Gerd tested), but I think it breaks with anything else. The command is : $ qemu-system-arm -M witherspoon-bmc -net user \ -drive file=/path/to/file.mtd,format=raw,if=mtd \ -nographic -serial mon:stdio -snapshot If I remove '-snapshot', all works fine. Here is a file : https://jenkins.openbmc.org/job/ci-openbmc/distro=ubuntu,label=docker-builder,target=witherspoon/lastSuccessfulBuild/artifact/openbmc/build/tmp/deploy/images/witherspoon/obmc-phosphor-image-witherspoon-20230207051412.ubi.mtd Thanks, C. +ret = blk_pread(blk, offset, bytes, (uint8_t *) buf + offset, 0); blk_*() makes even more sense conceptually, but it should behave the same. Kevin
Re: [PULL 03/38] pflash: Only read non-zero parts of backend image
Hello, On 1/20/23 13:25, Kevin Wolf wrote: From: Xiang Zheng Currently we fill the VIRT_FLASH memory space with two 64MB NOR images when using persistent UEFI variables on virt board. Actually we only use a very small(non-zero) part of the memory while the rest significant large(zero) part of memory is wasted. So this patch checks the block status and only writes the non-zero part into memory. This requires pflash devices to use sparse files for backends. Signed-off-by: Xiang Zheng [ kraxel: rebased to latest master ] Signed-off-by: Gerd Hoffmann Message-Id: <20221220084246.1984871-1-kra...@redhat.com> Reviewed-by: Daniel P. Berrangé Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf This newly merged patch introduces a "regression" when booting an Aspeed machine. The following extra m25p80 patch (not yet merged) is required for the issue to show: https://lore.kernel.org/qemu-devel/20221115151000.2080833-1-...@kaod.org/ U-Boot fails to find the filesystem in that case. It can be easily reproduced with the witherspoon-bmc machine and seems to be related to the use of a UBI filesystem. Other Aspeed machines not using UBI are not impacted. Here is a tentative fix. I don't know enough the block layer to explain what is happening :/ Thanks, C. @@ -39,7 +39,7 @@ static int blk_pread_nonzeroes(BlockBack return ret; } if (!(ret & BDRV_BLOCK_ZERO)) { -ret = bdrv_pread(bs->file, offset, bytes, +ret = blk_pread(blk, offset, bytes, (uint8_t *) buf + offset, 0); if (ret < 0) { return ret;
Re: [PATCH v8 01/13] linux-headers: Update to v6.2-rc1
On 1/16/23 15:11, Avihai Horon wrote: Update to commit 1b929c02afd3 ("Linux 6.2-rc1"). Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- include/standard-headers/drm/drm_fourcc.h | 63 +++- include/standard-headers/linux/ethtool.h | 81 - include/standard-headers/linux/fuse.h | 20 +- .../linux/input-event-codes.h | 4 + include/standard-headers/linux/pci_regs.h | 2 + include/standard-headers/linux/virtio_blk.h | 19 ++ include/standard-headers/linux/virtio_bt.h| 8 + include/standard-headers/linux/virtio_net.h | 4 + linux-headers/asm-arm64/kvm.h | 1 + linux-headers/asm-generic/hugetlb_encode.h| 26 +- linux-headers/asm-generic/mman-common.h | 2 + linux-headers/asm-mips/mman.h | 2 + linux-headers/asm-riscv/kvm.h | 7 + linux-headers/asm-x86/kvm.h | 11 +- linux-headers/linux/kvm.h | 32 +- linux-headers/linux/psci.h| 14 + linux-headers/linux/userfaultfd.h | 4 + linux-headers/linux/vfio.h| 278 +- 18 files changed, 522 insertions(+), 56 deletions(-) diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index 48b620cbef..69cab17b38 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -98,18 +98,42 @@ extern "C" { #define DRM_FORMAT_INVALID0 /* color index */ +#define DRM_FORMAT_C1 fourcc_code('C', '1', ' ', ' ') /* [7:0] C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */ +#define DRM_FORMAT_C2 fourcc_code('C', '2', ' ', ' ') /* [7:0] C0:C1:C2:C3 2:2:2:2 four pixels/byte */ +#define DRM_FORMAT_C4 fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 4:4 two pixels/byte */ #define DRM_FORMAT_C8 fourcc_code('C', '8', ' ', ' ') /* [7:0] C */ -/* 8 bpp Red */ +/* 1 bpp Darkness (inverse relationship between channel value and brightness) */ +#define DRM_FORMAT_D1 fourcc_code('D', '1', ' ', ' ') /* [7:0] D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */ + +/* 2 bpp Darkness (inverse relationship between channel value and brightness) */ +#define DRM_FORMAT_D2 fourcc_code('D', '2', ' ', ' ') /* [7:0] D0:D1:D2:D3 2:2:2:2 four pixels/byte */ + +/* 4 bpp Darkness (inverse relationship between channel value and brightness) */ +#define DRM_FORMAT_D4 fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 4:4 two pixels/byte */ + +/* 8 bpp Darkness (inverse relationship between channel value and brightness) */ +#define DRM_FORMAT_D8 fourcc_code('D', '8', ' ', ' ') /* [7:0] D */ + +/* 1 bpp Red (direct relationship between channel value and brightness) */ +#define DRM_FORMAT_R1 fourcc_code('R', '1', ' ', ' ') /* [7:0] R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */ + +/* 2 bpp Red (direct relationship between channel value and brightness) */ +#define DRM_FORMAT_R2 fourcc_code('R', '2', ' ', ' ') /* [7:0] R0:R1:R2:R3 2:2:2:2 four pixels/byte */ + +/* 4 bpp Red (direct relationship between channel value and brightness) */ +#define DRM_FORMAT_R4 fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 4:4 two pixels/byte */ + +/* 8 bpp Red (direct relationship between channel value and brightness) */ #define DRM_FORMAT_R8 fourcc_code('R', '8', ' ', ' ') /* [7:0] R */ -/* 10 bpp Red */ +/* 10 bpp Red (direct relationship between channel value and brightness) */ #define DRM_FORMAT_R10fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 6:10 little endian */ -/* 12 bpp Red */ +/* 12 bpp Red (direct relationship between channel value and brightness) */ #define DRM_FORMAT_R12fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 4:12 little endian */ -/* 16 bpp Red */ +/* 16 bpp Red (direct relationship between channel value and brightness) */ #define DRM_FORMAT_R16fourcc_code('R', '1', '6', ' ') /* [15:0] R little endian */ /* 16 bpp RG */ @@ -204,7 +228,9 @@ extern "C" { #define DRM_FORMAT_VYUY fourcc_code('V', 'Y', 'U', 'Y') /* [31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */ #define DRM_FORMAT_AYUV fourcc_code('A', 'Y', 'U', 'V') /* [31:0] A:Y:Cb:Cr 8:8:8:8 little endian */ +#define DRM_FORMAT_AVUYfourcc_code('A', 'V', 'U', 'Y') /* [31:0] A:Cr:Cb:Y 8:8:8:8 little endian */ #define DRM_FORMAT_XYUV fourcc_code('X', 'Y', 'U', 'V') /* [31:0] X:Y:Cb:Cr 8:8:8:8 little endian */ +#define DRM_FORMAT_XVUYfourcc_code('X', 'V', 'U', 'Y') /* [31:0] X:Cr:Cb:Y 8:8:8:8 little endian */ #define DRM_FORMAT_VUY888 fourcc_code('V', 'U', '2', '4') /* [23:0] Cr:Cb:Y 8:8:8 little endian */ #define DRM_FORMAT_VUY101010 fourcc_code('V', 'U', '3', '0') /* Y follow
Re: [PATCH v8 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support
On 1/16/23 15:11, Avihai Horon wrote: Currently, if IOMMU of a VFIO container doesn't support dirty page tracking, migration is blocked. This is because a DMA-able VFIO device can dirty RAM pages without updating QEMU about it, thus breaking the migration. However, this doesn't mean that migration can't be done at all. In such case, allow migration and let QEMU VFIO code mark all pages dirty. This guarantees that all pages that might have gotten dirty are reported back, and thus guarantees a valid migration even without VFIO IOMMU dirty tracking support. The motivation for this patch is the introduction of iommufd [1]. iommufd can directly implement the /dev/vfio/vfio container IOCTLs by mapping them into its internal ops, allowing the usage of these IOCTLs over iommufd. However, VFIO IOMMU dirty tracking is not supported by this VFIO compatibility API. This patch will allow migration by hosts that use the VFIO compatibility API and prevent migration regressions caused by the lack of VFIO IOMMU dirty tracking support. [1] https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_...@nvidia.com/ Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- hw/vfio/common.c| 20 ++-- hw/vfio/migration.c | 3 +-- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 130e5d1dc7..f6dd571549 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -488,6 +488,12 @@ static int vfio_dma_unmap(VFIOContainer *container, return -errno; } +if (iotlb && vfio_devices_all_running_and_saving(container)) { +cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size, +tcg_enabled() ? DIRTY_CLIENTS_ALL : +DIRTY_CLIENTS_NOCODE); +} + return 0; } @@ -1201,6 +1207,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start) .argsz = sizeof(dirty), }; +if (!container->dirty_pages_supported) { +return; +} + if (start) { dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START; } else { @@ -1236,6 +1246,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, uint64_t pages; int ret; +if (!container->dirty_pages_supported) { +cpu_physical_memory_set_dirty_range(ram_addr, size, +tcg_enabled() ? DIRTY_CLIENTS_ALL : +DIRTY_CLIENTS_NOCODE); +return 0; +} + dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range)); dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range); @@ -1409,8 +1426,7 @@ static void vfio_listener_log_sync(MemoryListener *listener, { VFIOContainer *container = container_of(listener, VFIOContainer, listener); -if (vfio_listener_skipped_section(section) || -!container->dirty_pages_supported) { +if (vfio_listener_skipped_section(section)) { return; } diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 09fe7c1de2..552c2313b2 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -860,11 +860,10 @@ int64_t vfio_mig_bytes_transferred(void) int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) { -VFIOContainer *container = vbasedev->group->container; struct vfio_region_info *info = NULL; int ret = -ENOTSUP; -if (!vbasedev->enable_migration || !container->dirty_pages_supported) { +if (!vbasedev->enable_migration) { goto add_blocker; }
Re: [PATCH v7 09/13] vfio/migration: Implement VFIO migration protocol v2
Hello Avihai, On 1/15/23 19:35, Avihai Horon wrote: Implement the basic mandatory part of VFIO migration protocol v2. This includes all functionality that is necessary to support VFIO_MIGRATION_STOP_COPY part of the v2 protocol. The two protocols, v1 and v2, will co-exist and in the following patches v1 protocol code will be removed. There are several main differences between v1 and v2 protocols: - VFIO device state is now represented as a finite state machine instead of a bitmap. - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE ioctl and normal read() and write() instead of the migration region. - Pre-copy is made optional in v2 protocol. Support for pre-copy will be added later on. Detailed information about VFIO migration protocol v2 and its difference compared to v1 protocol can be found here [1]. [1] https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/ Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater --- include/hw/vfio/vfio-common.h | 5 + hw/vfio/common.c | 19 +- hw/vfio/migration.c | 455 +++--- hw/vfio/trace-events | 7 + 4 files changed, 447 insertions(+), 39 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index bbaf72ba00..2ec3346fea 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -66,6 +66,11 @@ typedef struct VFIOMigration { int vm_running; Notifier migration_state; uint64_t pending_bytes; +enum vfio_device_mig_state device_state; This is an unknow type for windows and it breaks the build. I would use a 'uint32_t' just like device_state_v1. We should be fine for compile after this fix. See : https://gitlab.com/legoater/qemu/-/pipelines/748114331 Next step would be to include some tests in QEMU for CI. Is there any device for which we could extend the model and the driver to exercise migration ? I think this would be very useful for tests and maintainance. It could be done in a nested scenario, and virtio devices could be a target. Or we could introduce or reuse a dummy one for the purpose ? This is not a request for 8.0, only a good-to-have feature we should think about for the future releases. Thanks, C. +int data_fd; +void *data_buffer; +size_t data_buffer_size; +bool v2; } VFIOMigration; typedef struct VFIOAddressSpace { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 550b2d7ded..dcaa77d2a8 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -355,10 +355,18 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) return false; } -if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && +if (!migration->v2 && +(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) { return false; } + +if (migration->v2 && +(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && +(migration->device_state == VFIO_DEVICE_STATE_RUNNING || + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) { +return false; +} } } return true; @@ -385,7 +393,14 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container) return false; } -if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) { +if (!migration->v2 && +migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) { +continue; +} + +if (migration->v2 && +(migration->device_state == VFIO_DEVICE_STATE_RUNNING || + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) { continue; } else { return false; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 9df859f4d3..f19ada0f4f 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -10,6 +10,7 @@ #include "qemu/osdep.h" #include "qemu/main-loop.h" #include "qemu/cutils.h" +#include "qemu/units.h" #include #include @@ -44,8 +45,103 @@ #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xef13ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL) +/* + * This is an arbitrary size based on migration of mlx5 devices, where typically + * total device migration size is on the order of 100s of MB. Testing with + * larger values, e.g. 128MB and 1GB, did not show a performance improvement. + */ +#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB) + static int64_t by
Re: [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol
On 1/12/23 09:50, Avihai Horon wrote: Now that VFIO migration protocol v2 has been implemented and v1 protocol has been removed, update the documentation according to v2 protocol. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- docs/devel/vfio-migration.rst | 68 --- 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 9ff6163c88..1d50c2fe5f 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,46 +7,39 @@ the guest is running on source host and restoring this saved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. -Migration of VFIO devices consists of two phases: the optional pre-copy phase, -and the stop-and-copy phase. The pre-copy phase is iterative and allows to -accommodate VFIO devices that have a large amount of data that needs to be -transferred. The iterative pre-copy phase of migration allows for the guest to -continue whilst the VFIO device state is transferred to the destination, this -helps to reduce the total downtime of the VM. VFIO devices can choose to skip -the pre-copy phase of migration by returning pending_bytes as zero during the -pre-copy phase. +Migration of VFIO devices currently consists of a single stop-and-copy phase. +During the stop-and-copy phase the guest is stopped and the entire VFIO device +data is transferred to the destination. + +The pre-copy phase of migration is currently not supported for VFIO devices. +Support for VFIO pre-copy will be added later on. A detailed description of the UAPI for VFIO device migration can be found in -the comment for the ``vfio_device_migration_info`` structure in the header -file linux-headers/linux/vfio.h. +the comment for the ``vfio_device_mig_state`` structure in the header file +linux-headers/linux/vfio.h. VFIO implements the device hooks for the iterative approach as follows: -* A ``save_setup`` function that sets up the migration region and sets _SAVING - flag in the VFIO device state. +* A ``save_setup`` function that sets up migration on the source. -* A ``load_setup`` function that sets up the migration region on the - destination and sets _RESUMING flag in the VFIO device state. +* A ``load_setup`` function that sets the VFIO device on the destination in + _RESUMING state. * A ``save_live_pending`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has yet to save for the VFIO device. -* A ``save_live_iterate`` function that reads the VFIO device's data from the - vendor driver through the migration region during iterative phase. - * A ``save_state`` function to save the device config space if it is present. -* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the - VFIO device state and iteratively copies the remaining data for the VFIO - device until the vendor driver indicates that no data remains (pending bytes - is zero). +* A ``save_live_complete_precopy`` function that sets the VFIO device in + _STOP_COPY state and iteratively copies the data for the VFIO device until + the vendor driver indicates that no data remains. * A ``load_state`` function that loads the config section and the data - sections that are generated by the save functions above + sections that are generated by the save functions above. * ``cleanup`` functions for both save and load that perform any migration - related cleanup, including unmapping the migration region + related cleanup. The VFIO migration code uses a VM state change handler to change the VFIO @@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinned by the vendor driver can also be written by the device. There is currently no device or IOMMU support for dirty page tracking in hardware. -By default, dirty pages are tracked when the device is in pre-copy as well as -stop-and-copy phase. So, a page pinned by the vendor driver will be copied to -the destination in both phases. Copying dirty pages in pre-copy phase helps -QEMU to predict if it can achieve its downtime tolerances. If QEMU during -pre-copy phase keeps finding dirty pages continuously, then it understands -that even in stop-and-copy phase, it is likely to find dirty pages and can -predict the downtime accordingly. +By default, dirty pages are tracked during pre-copy as well as stop-and-copy +phase. So, a page pinned by the vendor driver will be copied to the destination +in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if +it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps +finding dirty pages continuously, then it understands that even in stop-and-copy +phase, it is likely to find dirty pages and can predict the downtime +accordingly
Re: [PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2
On 1/12/23 09:50, Avihai Horon wrote: Implement the basic mandatory part of VFIO migration protocol v2. This includes all functionality that is necessary to support VFIO_MIGRATION_STOP_COPY part of the v2 protocol. The two protocols, v1 and v2, will co-exist and in the following patches v1 protocol code will be removed. There are several main differences between v1 and v2 protocols: - VFIO device state is now represented as a finite state machine instead of a bitmap. - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE ioctl and normal read() and write() instead of the migration region. - Pre-copy is made optional in v2 protocol. Support for pre-copy will be added later on. Detailed information about VFIO migration protocol v2 and its difference compared to v1 protocol can be found here [1]. [1] https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/ Signed-off-by: Avihai Horon LGTM, Reviewed-by: Cédric Le Goater Still a small issue below, --- include/hw/vfio/vfio-common.h | 5 + hw/vfio/common.c | 19 +- hw/vfio/migration.c | 455 +++--- hw/vfio/trace-events | 7 + 4 files changed, 447 insertions(+), 39 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index bbaf72ba00..2ec3346fea 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -66,6 +66,11 @@ typedef struct VFIOMigration { int vm_running; Notifier migration_state; uint64_t pending_bytes; +enum vfio_device_mig_state device_state; +int data_fd; +void *data_buffer; +size_t data_buffer_size; +bool v2; } VFIOMigration; typedef struct VFIOAddressSpace { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 550b2d7ded..dcaa77d2a8 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -355,10 +355,18 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) return false; } -if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && +if (!migration->v2 && +(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) { return false; } + +if (migration->v2 && +(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && +(migration->device_state == VFIO_DEVICE_STATE_RUNNING || + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) { +return false; +} } } return true; @@ -385,7 +393,14 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container) return false; } -if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) { +if (!migration->v2 && +migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) { +continue; +} + +if (migration->v2 && +(migration->device_state == VFIO_DEVICE_STATE_RUNNING || + migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) { continue; } else { return false; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 9df859f4d3..08f53189fa 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -10,6 +10,7 @@ #include "qemu/osdep.h" #include "qemu/main-loop.h" #include "qemu/cutils.h" +#include "qemu/units.h" #include #include @@ -44,8 +45,103 @@ #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xef13ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL) +/* + * This is an arbitrary size based on migration of mlx5 devices, where typically + * total device migration size is on the order of 100s of MB. Testing with + * larger values, e.g. 128MB and 1GB, did not show a performance improvement. + */ +#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB) + static int64_t bytes_transferred; +static const char *mig_state_to_str(enum vfio_device_mig_state state) +{ +switch (state) { +case VFIO_DEVICE_STATE_ERROR: +return "ERROR"; +case VFIO_DEVICE_STATE_STOP: +return "STOP"; +case VFIO_DEVICE_STATE_RUNNING: +return "RUNNING"; +case VFIO_DEVICE_STATE_STOP_COPY: +return "STOP_COPY"; +case VFIO_DEVICE_STATE_RESUMING: +return "RESUMING"; +case VFIO_DEVICE_STATE_RUNNING_P2P: +return "RUNNING_P2P"; +default: +return "UNKNOWN STATE"; +} +} + +static int vfio_migration_set_state(VFIODevice
Re: [PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending()
On 1/12/23 09:50, Avihai Horon wrote: During pre-copy phase of migration vfio_save_pending() is called repeatedly and queries the VFIO device for its pending data size. As long as pending RAM size is over the threshold, migration can't converge and be completed. Therefore, during this time there is no point in querying the VFIO device pending data size. Avoid these unnecessary queries by issuing them in a RAM pre-copy notifier instead of vfio_save_pending(). This way the VFIO device is queried only when RAM pending data is below the threshold, when there is an actual chance for migration to converge. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- include/hw/vfio/vfio-common.h | 2 ++ hw/vfio/migration.c | 56 +++ hw/vfio/trace-events | 1 + 3 files changed, 46 insertions(+), 13 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 2ec3346fea..113f8d9208 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -65,11 +65,13 @@ typedef struct VFIOMigration { uint32_t device_state_v1; int vm_running; Notifier migration_state; +NotifierWithReturn migration_data; uint64_t pending_bytes; enum vfio_device_mig_state device_state; int data_fd; void *data_buffer; size_t data_buffer_size; +uint64_t stop_copy_size; bool v2; } VFIOMigration; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 08f53189fa..04f4397212 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -655,29 +655,19 @@ static void vfio_v1_save_cleanup(void *opaque) trace_vfio_save_cleanup(vbasedev->name); } -/* - * Migration size of VFIO devices can be as little as a few KBs or as big as - * many GBs. This value should be big enough to cover the worst case. - */ -#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB) static void vfio_save_pending(void *opaque, uint64_t threshold_size, uint64_t *res_precopy_only, uint64_t *res_compatible, uint64_t *res_postcopy_only) { VFIODevice *vbasedev = opaque; -uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; +VFIOMigration *migration = vbasedev->migration; -/* - * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is - * reported so downtime limit won't be violated. - */ -vfio_query_stop_copy_size(vbasedev, _copy_size); -*res_precopy_only += stop_copy_size; +*res_precopy_only += migration->stop_copy_size; trace_vfio_save_pending(vbasedev->name, *res_precopy_only, *res_postcopy_only, *res_compatible, -stop_copy_size); +migration->stop_copy_size); } static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size, @@ -1104,6 +1094,40 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data) } } +/* + * Migration size of VFIO devices can be as little as a few KBs or as big as + * many GBs. This value should be big enough to cover the worst case. + */ +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB) +static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data) +{ +VFIOMigration *migration = container_of(n, VFIOMigration, migration_data); +VFIODevice *vbasedev = migration->vbasedev; +PrecopyNotifyData *pnd = data; + +if (pnd->reason != PRECOPY_NOTIFY_AFTER_BITMAP_SYNC) { +return 0; +} + +/* No need to get pending size when finishing migration */ +if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { +return 0; +} + +if (vfio_query_stop_copy_size(vbasedev, >stop_copy_size)) { +/* + * Failed to get pending migration size. Report big pending size so + * downtime limit won't be violated. + */ +migration->stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; +} + +trace_vfio_migration_data_notifier(vbasedev->name, + migration->stop_copy_size); + +return 0; +} + static void vfio_migration_exit(VFIODevice *vbasedev) { VFIOMigration *migration = vbasedev->migration; @@ -1225,6 +1249,9 @@ static int vfio_migration_init(VFIODevice *vbasedev) migration->vm_state = qdev_add_vm_change_state_handler( vbasedev->dev, vfio_vmstate_change, vbasedev); + +migration->migration_data.notify = vfio_migration_data_notifier; +precopy_add_notifier(>migration_data); } else { register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, _vfio_v1_handlers, vbasedev); @@ -1283,6 +1310,9 @@ void vfio_migration_finalize(VFIODevice *vbasedev) if (vbasedev->migration) { VFIOMigration *migration =
Re: [PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file
On 1/12/23 09:50, Avihai Horon wrote: Sort the migration section of VFIO trace events file alphabetically and move two misplaced traces to common.c section. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- hw/vfio/trace-events | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 60c49b2ecf..db9cb94952 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]" vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8" vfio_dma_unmap_overflow_workaround(void) "" +vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64 +vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64 # platform.c vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d" @@ -148,20 +150,18 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u" vfio_display_edid_write_error(void) "" # migration.c +vfio_load_cleanup(const char *name) " (%s)" +vfio_load_device_config_state(const char *name) " (%s)" +vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 +vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d" +vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64 vfio_migration_probe(const char *name) " (%s)" vfio_migration_set_state(const char *name, const char *state) " (%s) state %s" -vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s" vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" -vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64 +vfio_save_block(const char *name, int data_size) " (%s) data_size %d" vfio_save_cleanup(const char *name) " (%s)" +vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" vfio_save_device_config_state(const char *name) " (%s)" vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64 -vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" -vfio_load_device_config_state(const char *name) " (%s)" -vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 -vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d" -vfio_load_cleanup(const char *name) " (%s)" -vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64 -vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64 -vfio_save_block(const char *name, int data_size) " (%s) data_size %d" -vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) stopcopy size 0x%"PRIx64 +vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64 +vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
Re: [PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one
On 1/12/23 09:50, Avihai Horon wrote: vfio_devices_all_running_and_saving() is used to check if migration is in pre-copy phase. This is done by checking if migration is in setup or active states and if all VFIO devices are in pre-copy state, i.e. _SAVING | _RUNNING. In VFIO migration protocol v2 pre-copy support is made optional. Hence, a matching v2 protocol pre-copy state can't be used here. As preparation for adding v2 protocol, change vfio_devices_all_running_and_saving() logic such that it doesn't use the VFIO pre-copy state. The new equivalent logic checks if migration is in active state and if all VFIO devices are in running state [1]. No functional changes intended. [1] Note that checking if migration is in setup or active states and if all VFIO devices are in running state doesn't guarantee that we are in pre-copy phase, thus we check if migration is only in active state. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- hw/vfio/common.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f6dd571549..3a35f4afad 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -40,6 +40,7 @@ #include "trace.h" #include "qapi/error.h" #include "migration/migration.h" +#include "migration/misc.h" #include "sysemu/tpm.h" VFIOGroupList vfio_group_list = @@ -363,13 +364,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) return true; } -static bool vfio_devices_all_running_and_saving(VFIOContainer *container) +/* + * Check if all VFIO devices are running and migration is active, which is + * essentially equivalent to the migration being in pre-copy phase. + */ +static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container) { VFIOGroup *group; VFIODevice *vbasedev; -MigrationState *ms = migrate_get_current(); -if (!migration_is_setup_or_active(ms->state)) { +if (!migration_is_active(migrate_get_current())) { return false; } @@ -381,8 +385,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container) return false; } -if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) && -(migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) { +if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) { continue; } else { return false; @@ -461,7 +464,7 @@ static int vfio_dma_unmap(VFIOContainer *container, }; if (iotlb && container->dirty_pages_supported && -vfio_devices_all_running_and_saving(container)) { +vfio_devices_all_running_and_mig_active(container)) { return vfio_dma_unmap_bitmap(container, iova, size, iotlb); } @@ -488,7 +491,7 @@ static int vfio_dma_unmap(VFIOContainer *container, return -errno; } -if (iotlb && vfio_devices_all_running_and_saving(container)) { +if (iotlb && vfio_devices_all_running_and_mig_active(container)) { cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size, tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE);
Re: [PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1
On 1/12/23 09:50, Avihai Horon wrote: Now that v2 protocol implementation has been added, remove the deprecated v1 implementation. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- include/hw/vfio/vfio-common.h | 5 - hw/vfio/common.c | 19 +- hw/vfio/migration.c | 703 +- hw/vfio/trace-events | 9 - 4 files changed, 24 insertions(+), 712 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 113f8d9208..2aba45887c 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -61,18 +61,13 @@ typedef struct VFIORegion { typedef struct VFIOMigration { struct VFIODevice *vbasedev; VMChangeStateEntry *vm_state; -VFIORegion region; -uint32_t device_state_v1; -int vm_running; Notifier migration_state; NotifierWithReturn migration_data; -uint64_t pending_bytes; enum vfio_device_mig_state device_state; int data_fd; void *data_buffer; size_t data_buffer_size; uint64_t stop_copy_size; -bool v2; } VFIOMigration; typedef struct VFIOAddressSpace { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index dcaa77d2a8..9a0dbee6b4 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) return false; } -if (!migration->v2 && -(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && -(migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) { -return false; -} - -if (migration->v2 && -(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && +if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && (migration->device_state == VFIO_DEVICE_STATE_RUNNING || migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) { return false; @@ -393,14 +386,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container) return false; } -if (!migration->v2 && -migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) { -continue; -} - -if (migration->v2 && -(migration->device_state == VFIO_DEVICE_STATE_RUNNING || - migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) { +if (migration->device_state == VFIO_DEVICE_STATE_RUNNING || +migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) { continue; } else { return false; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 04f4397212..7688c83127 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -142,220 +142,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, return 0; } -static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count, - off_t off, bool iswrite) -{ -int ret; - -ret = iswrite ? pwrite(vbasedev->fd, val, count, off) : -pread(vbasedev->fd, val, count, off); -if (ret < count) { -error_report("vfio_mig_%s %d byte %s: failed at offset 0x%" - HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count, - vbasedev->name, off, strerror(errno)); -return (ret < 0) ? ret : -EINVAL; -} -return 0; -} - -static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count, - off_t off, bool iswrite) -{ -int ret, done = 0; -__u8 *tbuf = buf; - -while (count) { -int bytes = 0; - -if (count >= 8 && !(off % 8)) { -bytes = 8; -} else if (count >= 4 && !(off % 4)) { -bytes = 4; -} else if (count >= 2 && !(off % 2)) { -bytes = 2; -} else { -bytes = 1; -} - -ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite); -if (ret) { -return ret; -} - -count -= bytes; -done += bytes; -off += bytes; -tbuf += bytes; -} -return done; -} - -#define vfio_mig_read(f, v, c, o) vfio_mig_rw(f, (__u8 *)v, c, o, false) -#define vfio_mig_write(f, v, c, o) vfio_mig_rw(f, (__u8 *)v, c, o, true) - -#define VFIO_MIG_STRUCT_OFFSET(f) \ - offsetof(struct vfio_device_migration_info, f) -/* - * Change the device_state register for device @vbasedev. Bits set in @mask - * are preserved, bits set in @v
Re: [PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd()
On 1/12/23 09:50, Avihai Horon wrote: Add new function qemu_file_get_to_fd() that allows reading data from QEMUFile and writing it straight into a given fd. This will be used later in VFIO migration code. Signed-off-by: Avihai Horon Reviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Cédric Le Goater Thanks, C. --- migration/qemu-file.h | 1 + migration/qemu-file.c | 34 ++ 2 files changed, 35 insertions(+) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index fa13d04d78..9d0155a2a1 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 2d5f74ffc2..102ab3b439 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file) { return file->ioc; } + +/* + * Read size bytes from QEMUFile f and write them to fd. + */ +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size) +{ +while (size) { +size_t pending = f->buf_size - f->buf_index; +ssize_t rc; + +if (!pending) { +rc = qemu_fill_buffer(f); +if (rc < 0) { +return rc; +} +if (rc == 0) { +return -EIO; +} +continue; +} + +rc = write(fd, f->buf + f->buf_index, MIN(pending, size)); +if (rc < 0) { +return -errno; +} +if (rc == 0) { +return -EIO; +} +f->buf_index += rc; +size -= rc; +} + +return 0; +}
Re: [PATCH v5 10/14] vfio/migration: Implement VFIO migration protocol v2
Hello Avihai On 1/10/23 15:08, Avihai Horon wrote: On 09/01/2023 20:36, Jason Gunthorpe wrote: On Mon, Jan 09, 2023 at 06:27:21PM +0100, Cédric Le Goater wrote: also, in vfio_migration_query_flags() : +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags) +{ + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_migration), + sizeof(uint64_t))] = {}; + struct vfio_device_feature *feature = (struct vfio_device_feature *)buf; + struct vfio_device_feature_migration *mig = + (struct vfio_device_feature_migration *)feature->data; + + feature->argsz = sizeof(buf); + feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION; + if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { + return -EOPNOTSUPP; + } + + *mig_flags = mig->flags; + + return 0; +} The code is using any possible error returned by the VFIO_DEVICE_FEATURE ioctl to distinguish protocol v1 from v2. I'm comfortable with that from a kernel perspective. There is no such thing as this API failing in the kernel but userspace should continue on, no matter what the error code. So always failing here is correct. About the only thing you might want to do is convert anything other than ENOTTY into a hard qemu failure similar to failing to open /dev/vfio or something - it means something has gone really wrong.. But that is pretty obscure stuff Hi Cedric, With Jason's input, is it ok by you to leave the code as is? The patchset removes v1 later on, I think we are fine. I was reading it sequentially and it felt like a weak spot. All errors are translated in EOPNOTSUPP. That's always true for pre-v6.0 kernels, returning -ENOTTY, and v6.0+ kernels will do the same unless a mlx5vf device is passthru. I still wonder if we should report some errors for the ! -ENOTTY case. So the code below could be a good addition. Thanks, C. if not, would this be fine? +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags) +{ + uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_migration), + sizeof(uint64_t))] = {}; + struct vfio_device_feature *feature = (struct vfio_device_feature *)buf; + struct vfio_device_feature_migration *mig = + (struct vfio_device_feature_migration *)feature->data; + + feature->argsz = sizeof(buf); + feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION; + if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { + if (errno == ENOTTY) { + error_report("%s: VFIO migration is not supported in kernel", + vbasedev->name); + } else { + error_report("%s: Failed to query VFIO migration support, err: %s", + vbasedev->name, strerror(errno)); + } + + return -errno; + } + + *mig_flags = mig->flags; + + return 0; +} + and then in vfio_migration_init() prior v1 removal: + ret = vfio_migration_query_flags(vbasedev, _flags); + if (!ret) { + /* Migration v2 */ + } else if (ret == -ENOTTY) { + /* Migration v1 */ + } else { + return ret; + } and after v1 removal vfio_migration_init() will be: ret = vfio_migration_query_flags(vbasedev, _flags); if (ret) { return ret; } Thanks.
Re: [PATCH v5 10/14] vfio/migration: Implement VFIO migration protocol v2
On 1/9/23 16:12, Avihai Horon wrote: On 09/01/2023 12:20, Cédric Le Goater wrote: External email: Use caution opening links or attachments Hello Avihai, On 12/29/22 12:03, Avihai Horon wrote: +static int vfio_save_setup(QEMUFile *f, void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + uint64_t stop_copy_size; + + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); + + if (vfio_query_stop_copy_size(vbasedev, _copy_size)) { + stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; + } + migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE, + stop_copy_size); + migration->data_buffer = g_try_malloc0(migration->data_buffer_size); + if (!migration->data_buffer) { + error_report("%s: Failed to allocate migration data buffer", + vbasedev->name); + return -ENOMEM; + } + + trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size); + + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + + return qemu_file_get_error(f); +} + This fails to compile with : gcc version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC) complains with : ../include/qemu/osdep.h:315:22: error: ‘stop_copy_size’ may be used uninitialized [-Werror=maybe-uninitialized] 315 | _a < _b ? _a : _b; \ | ^ ../hw/vfio/migration.c:262:14: note: ‘stop_copy_size’ was declared here 262 | uint64_t stop_copy_size; | ^~ cc1: all warnings being treated as errors May be rework the code slightly to avoid the breakage : +++ qemu.git/hw/vfio/migration.c @@ -259,13 +259,11 @@ static int vfio_save_setup(QEMUFile *f, { VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; - uint64_t stop_copy_size; + uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); - if (vfio_query_stop_copy_size(vbasedev, _copy_size)) { - stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; - } + vfio_query_stop_copy_size(vbasedev, _copy_size); migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE, stop_copy_size); migration->data_buffer = g_try_malloc0(migration->data_buffer_size); and report the error in vfio_query_stop_copy_size() Thanks, Cedric. There is another similar case in vfio_save_pending(). I will fix both of them. also, in vfio_migration_query_flags() : +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags) +{ +uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) + + sizeof(struct vfio_device_feature_migration), + sizeof(uint64_t))] = {}; +struct vfio_device_feature *feature = (struct vfio_device_feature *)buf; +struct vfio_device_feature_migration *mig = +(struct vfio_device_feature_migration *)feature->data; + +feature->argsz = sizeof(buf); +feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION; +if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { +return -EOPNOTSUPP; +} + +*mig_flags = mig->flags; + +return 0; +} The code is using any possible error returned by the VFIO_DEVICE_FEATURE ioctl to distinguish protocol v1 from v2. Couldn't we use ENOTTY ? I think a pre-v6.0 kernel would return this errno. Some error report would be good to have. Thanks, C.
Re: [PATCH v5 09/14] vfio/migration: Rename functions/structs related to v1 protocol
On 12/29/22 12:03, Avihai Horon wrote: To avoid name collisions, rename functions and structs related to VFIO migration protocol v1. This will allow the two protocols to co-exist when v2 protocol is added, until v1 is removed. No functional changes intended. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- include/hw/vfio/vfio-common.h | 2 +- hw/vfio/common.c | 6 +- hw/vfio/migration.c | 106 +- hw/vfio/trace-events | 12 ++-- 4 files changed, 63 insertions(+), 63 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index e573f5a9f1..bbaf72ba00 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -62,7 +62,7 @@ typedef struct VFIOMigration { struct VFIODevice *vbasedev; VMChangeStateEntry *vm_state; VFIORegion region; -uint32_t device_state; +uint32_t device_state_v1; int vm_running; Notifier migration_state; uint64_t pending_bytes; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 7a35edb0e9..6f0157c848 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -355,8 +355,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) return false; } -if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) -&& (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) { +if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) && +(migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) { return false; } } @@ -385,7 +385,7 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container) return false; } -if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) { +if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) { continue; } else { return false; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 977da64411..9df859f4d3 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -107,8 +107,8 @@ static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count, * an error is returned. */ -static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, -uint32_t value) +static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask, + uint32_t value) { VFIOMigration *migration = vbasedev->migration; VFIORegion *region = >region; @@ -145,8 +145,8 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, return ret; } -migration->device_state = device_state; -trace_vfio_migration_set_state(vbasedev->name, device_state); +migration->device_state_v1 = device_state; +trace_vfio_migration_v1_set_state(vbasedev->name, device_state); return 0; } @@ -260,8 +260,8 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size) return ret; } -static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev, -uint64_t data_size) +static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev, + uint64_t data_size) { VFIORegion *region = >migration->region; uint64_t data_offset = 0, size, report_size; @@ -288,7 +288,7 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev, data_size = 0; } -trace_vfio_load_state_device_data(vbasedev->name, data_offset, size); +trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size); while (size) { void *buf; @@ -394,7 +394,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) return qemu_file_get_error(f); } -static void vfio_migration_cleanup(VFIODevice *vbasedev) +static void vfio_migration_v1_cleanup(VFIODevice *vbasedev) { VFIOMigration *migration = vbasedev->migration; @@ -405,13 +405,13 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev) /* -- */ -static int vfio_save_setup(QEMUFile *f, void *opaque) +static int vfio_v1_save_setup(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; int ret; -trace_vfio_save_setup(vbasedev->name); +trace_vfio_v1_save_setup(vbasedev->name); qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); @@ -431,8 +431,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
Re: [PATCH v5 08/14] vfio/migration: Move migration v1 logic to vfio_migration_init()
On 12/29/22 12:03, Avihai Horon wrote: Move vfio_dev_get_region_info() logic from vfio_migration_probe() to vfio_migration_init(). This logic is specific to v1 protocol and moving it will make it easier to add the v2 protocol implementation later. No functional changes intended. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- hw/vfio/migration.c | 30 +++--- hw/vfio/trace-events | 2 +- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 552c2313b2..977da64411 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -788,14 +788,14 @@ static void vfio_migration_exit(VFIODevice *vbasedev) vbasedev->migration = NULL; } -static int vfio_migration_init(VFIODevice *vbasedev, - struct vfio_region_info *info) +static int vfio_migration_init(VFIODevice *vbasedev) { int ret; Object *obj; VFIOMigration *migration; char id[256] = ""; g_autofree char *path = NULL, *oid = NULL; +struct vfio_region_info *info; if (!vbasedev->ops->vfio_get_object) { return -EINVAL; @@ -806,6 +806,14 @@ static int vfio_migration_init(VFIODevice *vbasedev, return -EINVAL; } +ret = vfio_get_dev_region_info(vbasedev, + VFIO_REGION_TYPE_MIGRATION_DEPRECATED, + VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, + ); +if (ret) { +return ret; +} + vbasedev->migration = g_new0(VFIOMigration, 1); vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING; vbasedev->migration->vm_running = runstate_is_running(); @@ -825,6 +833,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, goto err; } +g_free(info); + migration = vbasedev->migration; migration->vbasedev = vbasedev; @@ -847,6 +857,7 @@ static int vfio_migration_init(VFIODevice *vbasedev, return 0; err: +g_free(info); vfio_migration_exit(vbasedev); return ret; } @@ -860,34 +871,23 @@ int64_t vfio_mig_bytes_transferred(void) int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) { -struct vfio_region_info *info = NULL; int ret = -ENOTSUP; if (!vbasedev->enable_migration) { goto add_blocker; } -ret = vfio_get_dev_region_info(vbasedev, - VFIO_REGION_TYPE_MIGRATION_DEPRECATED, - VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, - ); -if (ret) { -goto add_blocker; -} - -ret = vfio_migration_init(vbasedev, info); +ret = vfio_migration_init(vbasedev); if (ret) { goto add_blocker; } -trace_vfio_migration_probe(vbasedev->name, info->index); -g_free(info); +trace_vfio_migration_probe(vbasedev->name); return 0; add_blocker: error_setg(>migration_blocker, "VFIO device doesn't support migration"); -g_free(info); ret = migrate_add_blocker(vbasedev->migration_blocker, errp); if (ret < 0) { diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 73dffe9e00..b259dcc644 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -148,7 +148,7 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u" vfio_display_edid_write_error(void) "" # migration.c -vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" +vfio_migration_probe(const char *name) " (%s)" vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
Re: [PATCH v5 06/14] migration/qemu-file: Add qemu_file_get_to_fd()
On 12/29/22 12:03, Avihai Horon wrote: Add new function qemu_file_get_to_fd() that allows reading data from QEMUFile and writing it straight into a given fd. This will be used later in VFIO migration code. Signed-off-by: Avihai Horon Reviewed-by: Vladimir Sementsov-Ogievskiy --- migration/qemu-file.h | 1 + migration/qemu-file.c | 34 ++ 2 files changed, 35 insertions(+) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index fa13d04d78..9d0155a2a1 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 2d5f74ffc2..79303c9d34 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file) { return file->ioc; } + +/* + * Read size bytes from QEMUFile f and write them to fd. + */ +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size) +{ +while (size) { +size_t pending = f->buf_size - f->buf_index; +ssize_t rc; + +if (!pending) { +rc = qemu_fill_buffer(f); +if (rc < 0) { +return rc; +} +if (rc == 0) { +return -1; Given the call stack, -1 would be interpreted as EPERM. May be EIO instead ? C. +} +continue; +} + +rc = write(fd, f->buf + f->buf_index, MIN(pending, size)); +if (rc < 0) { +return rc; +} +if (rc == 0) { +return -1; +} +f->buf_index += rc; +size -= rc; +} + +return 0; +}
Re: [PATCH v5 03/14] migration: Simplify migration_iteration_run()
On 12/29/22 12:03, Avihai Horon wrote: From: Juan Quintela Signed-off-by: Juan Quintela Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Thanks, C. --- migration/migration.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 9795d0ec5c..61b9ce0fe8 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3758,23 +3758,24 @@ static MigIterateState migration_iteration_run(MigrationState *s) trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_compat, pend_post); -if (pending_size && pending_size >= s->threshold_size) { -/* Still a significant amount to transfer */ -if (!in_postcopy && pend_pre <= s->threshold_size && -qatomic_read(>start_postcopy)) { -if (postcopy_start(s)) { -error_report("%s: postcopy failed to start", __func__); -} -return MIG_ITERATE_SKIP; -} -/* Just another iteration step */ -qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); -} else { + +if (!pending_size || pending_size < s->threshold_size) { trace_migration_thread_low_pending(pending_size); migration_completion(s); return MIG_ITERATE_BREAK; } +/* Still a significant amount to transfer */ +if (!in_postcopy && pend_pre <= s->threshold_size && +qatomic_read(>start_postcopy)) { +if (postcopy_start(s)) { +error_report("%s: postcopy failed to start", __func__); +} +return MIG_ITERATE_SKIP; +} + +/* Just another iteration step */ +qemu_savevm_state_iterate(s->to_dst_file, in_postcopy); return MIG_ITERATE_RESUME; }
Re: [PATCH v5 10/14] vfio/migration: Implement VFIO migration protocol v2
Hello Avihai, On 12/29/22 12:03, Avihai Horon wrote: +static int vfio_save_setup(QEMUFile *f, void *opaque) +{ +VFIODevice *vbasedev = opaque; +VFIOMigration *migration = vbasedev->migration; +uint64_t stop_copy_size; + +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); + +if (vfio_query_stop_copy_size(vbasedev, _copy_size)) { +stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; +} +migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE, + stop_copy_size); +migration->data_buffer = g_try_malloc0(migration->data_buffer_size); +if (!migration->data_buffer) { +error_report("%s: Failed to allocate migration data buffer", + vbasedev->name); +return -ENOMEM; +} + +trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size); + +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + +return qemu_file_get_error(f); +} + This fails to compile with : gcc version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC) complains with : ../include/qemu/osdep.h:315:22: error: ‘stop_copy_size’ may be used uninitialized [-Werror=maybe-uninitialized] 315 | _a < _b ? _a : _b; \ | ^ ../hw/vfio/migration.c:262:14: note: ‘stop_copy_size’ was declared here 262 | uint64_t stop_copy_size; | ^~ cc1: all warnings being treated as errors May be rework the code slightly to avoid the breakage : +++ qemu.git/hw/vfio/migration.c @@ -259,13 +259,11 @@ static int vfio_save_setup(QEMUFile *f, { VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; -uint64_t stop_copy_size; +uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); -if (vfio_query_stop_copy_size(vbasedev, _copy_size)) { -stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; -} +vfio_query_stop_copy_size(vbasedev, _copy_size); migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE, stop_copy_size); migration->data_buffer = g_try_malloc0(migration->data_buffer_size); and report the error in vfio_query_stop_copy_size() Thanks, C.
Re: [PATCH] m25p80: Add the is25wp256 SFPD table
On 12/27/22 07:31, Tudor Ambarus wrote: On 25.12.2022 14:18, Ben Dooks wrote: On Wed, Dec 21, 2022 at 06:36:02PM +0100, Cédric Le Goater wrote: On 12/21/22 13:22, Guenter Roeck wrote: Generated from hardware using the following command and then padding with 0xff to fill out a power-of-2: xxd -p /sys/bus/spi/devices/spi0.0/spi-nor/sfdp Cc: Michael Walle Cc: Tudor Ambarus Signed-off-by: Guenter Roeck Reviewed-by: Cédric Le Goater If SFDP is a standard, couldn't we have an function to generate it from the flash parameters? No, it's not practical as you have to specify all the flash parameters at flash declaration. Indeed and the definition of flash models in QEMU is far to cover all the SFDP features. The known_devices array of m25p80 would be huge ! However, we could generate some of the SFDP tables if no raw data is provided. It could be useful for testing drivers. Thanks, C.
Re: [PATCH] m25p80: Add the is25wp256 SFPD table
On 12/21/22 13:22, Guenter Roeck wrote: Generated from hardware using the following command and then padding with 0xff to fill out a power-of-2: xxd -p /sys/bus/spi/devices/spi0.0/spi-nor/sfdp Cc: Michael Walle Cc: Tudor Ambarus Signed-off-by: Guenter Roeck Reviewed-by: Cédric Le Goater Thanks, C. --- hw/block/m25p80.c | 3 ++- hw/block/m25p80_sfdp.c | 40 hw/block/m25p80_sfdp.h | 2 ++ 3 files changed, 44 insertions(+), 1 deletion(-) diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 02adc87527..802d2eb021 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -221,7 +221,8 @@ static const FlashPartInfo known_devices[] = { { INFO("is25wp032", 0x9d7016, 0, 64 << 10, 64, ER_4K) }, { INFO("is25wp064", 0x9d7017, 0, 64 << 10, 128, ER_4K) }, { INFO("is25wp128", 0x9d7018, 0, 64 << 10, 256, ER_4K) }, -{ INFO("is25wp256", 0x9d7019, 0, 64 << 10, 512, ER_4K) }, +{ INFO("is25wp256", 0x9d7019, 0, 64 << 10, 512, ER_4K), + .sfdp_read = m25p80_sfdp_is25wp256 }, /* Macronix */ { INFO("mx25l2005a", 0xc22012, 0, 64 << 10, 4, ER_4K) }, diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c index 77615fa29e..b33811a4f5 100644 --- a/hw/block/m25p80_sfdp.c +++ b/hw/block/m25p80_sfdp.c @@ -330,3 +330,43 @@ static const uint8_t sfdp_w25q01jvq[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, }; define_sfdp_read(w25q01jvq); + +/* + * Integrated Silicon Solution (ISSI) + */ + +static const uint8_t sfdp_is25wp256[] = { +0x53, 0x46, 0x44, 0x50, 0x06, 0x01, 0x01, 0xff, +0x00, 0x06, 0x01, 0x10, 0x30, 0x00, 0x00, 0xff, +0x9d, 0x05, 0x01, 0x03, 0x80, 0x00, 0x00, 0x02, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xe5, 0x20, 0xf9, 0xff, 0xff, 0xff, 0xff, 0x0f, +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x80, 0xbb, +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff, +0xff, 0xff, 0x44, 0xeb, 0x0c, 0x20, 0x0f, 0x52, +0x10, 0xd8, 0x00, 0xff, 0x23, 0x4a, 0xc9, 0x00, +0x82, 0xd8, 0x11, 0xce, 0xcc, 0xcd, 0x68, 0x46, +0x7a, 0x75, 0x7a, 0x75, 0xf7, 0xae, 0xd5, 0x5c, +0x4a, 0x42, 0x2c, 0xff, 0xf0, 0x30, 0xfa, 0xa9, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0x50, 0x19, 0x50, 0x16, 0x9f, 0xf9, 0xc0, 0x64, +0x8f, 0xef, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +}; +define_sfdp_read(is25wp256); diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h index df7adfb5ce..011a880f66 100644 --- a/hw/block/m25p80_sfdp.h +++ b/hw/block/m25p80_sfdp.h @@ -26,4 +26,6 @@ uint8_t m25p80_sfdp_w25q512jv(uint32_t addr); uint8_t m25p80_sfdp_w25q01jvq(uint32_t addr); +uint8_t m25p80_sfdp_is25wp256(uint32_t addr); + #endif
Re: [PATCH maybe-7.2 1/3] hw/i2c: only schedule pending master when bus is idle
On 11/17/22 12:58, Klaus Jensen wrote: On Nov 17 09:01, Cédric Le Goater wrote: On 11/17/22 08:37, Klaus Jensen wrote: On Nov 17 07:56, Cédric Le Goater wrote: On 11/17/22 07:40, Klaus Jensen wrote: On Nov 16 16:58, Cédric Le Goater wrote: On 11/16/22 09:43, Klaus Jensen wrote: From: Klaus Jensen It is not given that the current master will release the bus after a transfer ends. Only schedule a pending master if the bus is idle. Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters") Signed-off-by: Klaus Jensen --- hw/i2c/aspeed_i2c.c | 2 ++ hw/i2c/core.c| 37 ++--- include/hw/i2c/i2c.h | 2 ++ 3 files changed, 26 insertions(+), 15 deletions(-) diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index c166fd20fa11..1f071a3811f7 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -550,6 +550,8 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value) } SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_STOP_CMD, 0); aspeed_i2c_set_state(bus, I2CD_IDLE); + +i2c_schedule_pending_master(bus->bus); Shouldn't it be i2c_bus_release() ? The reason for having both i2c_bus_release() and i2c_schedule_pending_master() is that i2c_bus_release() sort of pairs with i2c_bus_master(). They either set or clear the bus->bh member. In the current design, the controller (in this case the Aspeed I2C) is an "implicit" master (it does not have a bottom half driving it), so there is no bus->bh to clear. I should (and will) write some documentation on the asynchronous API. I found the routine names confusing. Thanks for the clarification. Maybe we could do this rename : i2c_bus_release() -> i2c_bus_release_and_clear() i2c_schedule_pending_master() -> i2c_bus_release() and keep i2c_schedule_pending_master() internal the I2C core subsystem. How about renaming i2c_bus_master to i2c_bus_acquire() such that it pairs with i2c_bus_release(). Looks good to me. And then add an i2c_bus_yield() to be used by the controller? I think we should be able to assert in i2c_bus_yield() that bus->bh is NULL. But I'll take a closer look at that. I am using your i2c-echo slave device to track regressions in the Aspeed machines. May be we could merge it for tests ? Oh, cool. Sure, I'd be happy to help "maintain" it ;) And so, I am seeing errors with the little POC you sent. without: console: echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device console: # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device console: [4.252431] i2c i2c-3: new_device: Instantiated device slave-24c02 at 0x64 console: i2cset -y 3 0x42 0x64 0x00 0xaa i /console: # i2cset -y 3 0x42 0x64 0x00 0xaa i console: # hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom console: 000 ffaa console: poweroff console: 010 console: * console: 100 with: console: echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device console: # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device console: [4.413210] i2c i2c-3: new_device: Instantiated device slave-24c02 at 0x64 console: i2cset -y 3 0x42 0x64 0x00 0xaa i console: # i2cset -y 3 0x42 0x64 0x00 0xaa i console: # hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom console: 000 console: * console: 100 C.
Re: [PATCH maybe-7.2 1/3] hw/i2c: only schedule pending master when bus is idle
On 11/17/22 08:37, Klaus Jensen wrote: On Nov 17 07:56, Cédric Le Goater wrote: On 11/17/22 07:40, Klaus Jensen wrote: On Nov 16 16:58, Cédric Le Goater wrote: On 11/16/22 09:43, Klaus Jensen wrote: From: Klaus Jensen It is not given that the current master will release the bus after a transfer ends. Only schedule a pending master if the bus is idle. Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters") Signed-off-by: Klaus Jensen --- hw/i2c/aspeed_i2c.c | 2 ++ hw/i2c/core.c| 37 ++--- include/hw/i2c/i2c.h | 2 ++ 3 files changed, 26 insertions(+), 15 deletions(-) diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index c166fd20fa11..1f071a3811f7 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -550,6 +550,8 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value) } SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_STOP_CMD, 0); aspeed_i2c_set_state(bus, I2CD_IDLE); + +i2c_schedule_pending_master(bus->bus); Shouldn't it be i2c_bus_release() ? The reason for having both i2c_bus_release() and i2c_schedule_pending_master() is that i2c_bus_release() sort of pairs with i2c_bus_master(). They either set or clear the bus->bh member. In the current design, the controller (in this case the Aspeed I2C) is an "implicit" master (it does not have a bottom half driving it), so there is no bus->bh to clear. I should (and will) write some documentation on the asynchronous API. I found the routine names confusing. Thanks for the clarification. Maybe we could do this rename : i2c_bus_release() -> i2c_bus_release_and_clear() i2c_schedule_pending_master() -> i2c_bus_release() and keep i2c_schedule_pending_master() internal the I2C core subsystem. How about renaming i2c_bus_master to i2c_bus_acquire() such that it pairs with i2c_bus_release(). Looks good to me. And then add an i2c_bus_yield() to be used by the controller? I think we should be able to assert in i2c_bus_yield() that bus->bh is NULL. But I'll take a closer look at that. I am using your i2c-echo slave device to track regressions in the Aspeed machines. May be we could merge it for tests ? Thanks, C.
Re: [PATCH maybe-7.2 1/3] hw/i2c: only schedule pending master when bus is idle
On 11/17/22 07:40, Klaus Jensen wrote: On Nov 16 16:58, Cédric Le Goater wrote: On 11/16/22 09:43, Klaus Jensen wrote: From: Klaus Jensen It is not given that the current master will release the bus after a transfer ends. Only schedule a pending master if the bus is idle. Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters") Signed-off-by: Klaus Jensen --- hw/i2c/aspeed_i2c.c | 2 ++ hw/i2c/core.c| 37 ++--- include/hw/i2c/i2c.h | 2 ++ 3 files changed, 26 insertions(+), 15 deletions(-) diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index c166fd20fa11..1f071a3811f7 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -550,6 +550,8 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value) } SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_STOP_CMD, 0); aspeed_i2c_set_state(bus, I2CD_IDLE); + +i2c_schedule_pending_master(bus->bus); Shouldn't it be i2c_bus_release() ? The reason for having both i2c_bus_release() and i2c_schedule_pending_master() is that i2c_bus_release() sort of pairs with i2c_bus_master(). They either set or clear the bus->bh member. In the current design, the controller (in this case the Aspeed I2C) is an "implicit" master (it does not have a bottom half driving it), so there is no bus->bh to clear. I should (and will) write some documentation on the asynchronous API. I found the routine names confusing. Thanks for the clarification. Maybe we could do this rename : i2c_bus_release() -> i2c_bus_release_and_clear() i2c_schedule_pending_master() -> i2c_bus_release() and keep i2c_schedule_pending_master() internal the I2C core subsystem. C.
Re: [PATCH maybe-7.2 1/3] hw/i2c: only schedule pending master when bus is idle
On 11/16/22 09:43, Klaus Jensen wrote: From: Klaus Jensen It is not given that the current master will release the bus after a transfer ends. Only schedule a pending master if the bus is idle. Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters") Signed-off-by: Klaus Jensen --- hw/i2c/aspeed_i2c.c | 2 ++ hw/i2c/core.c| 37 ++--- include/hw/i2c/i2c.h | 2 ++ 3 files changed, 26 insertions(+), 15 deletions(-) diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index c166fd20fa11..1f071a3811f7 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -550,6 +550,8 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value) } SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_STOP_CMD, 0); aspeed_i2c_set_state(bus, I2CD_IDLE); + +i2c_schedule_pending_master(bus->bus); Shouldn't it be i2c_bus_release() ? Thanks, C. } if (aspeed_i2c_bus_pkt_mode_en(bus)) { diff --git a/hw/i2c/core.c b/hw/i2c/core.c index d4ba8146bffb..bed594fe599b 100644 --- a/hw/i2c/core.c +++ b/hw/i2c/core.c @@ -185,22 +185,39 @@ int i2c_start_transfer(I2CBus *bus, uint8_t address, bool is_recv) void i2c_bus_master(I2CBus *bus, QEMUBH *bh) { +I2CPendingMaster *node = g_new(struct I2CPendingMaster, 1); +node->bh = bh; + +QSIMPLEQ_INSERT_TAIL(>pending_masters, node, entry); +} + +void i2c_schedule_pending_master(I2CBus *bus) +{ +I2CPendingMaster *node; + if (i2c_bus_busy(bus)) { -I2CPendingMaster *node = g_new(struct I2CPendingMaster, 1); -node->bh = bh; - -QSIMPLEQ_INSERT_TAIL(>pending_masters, node, entry); +/* someone is already controlling the bus; wait for it to release it */ +return; +} +if (QSIMPLEQ_EMPTY(>pending_masters)) { return; } -bus->bh = bh; +node = QSIMPLEQ_FIRST(>pending_masters); +bus->bh = node->bh; + +QSIMPLEQ_REMOVE_HEAD(>pending_masters, entry); +g_free(node); + qemu_bh_schedule(bus->bh); } void i2c_bus_release(I2CBus *bus) { bus->bh = NULL; + +i2c_schedule_pending_master(bus); } int i2c_start_recv(I2CBus *bus, uint8_t address) @@ -234,16 +251,6 @@ void i2c_end_transfer(I2CBus *bus) g_free(node); } bus->broadcast = false; - -if (!QSIMPLEQ_EMPTY(>pending_masters)) { -I2CPendingMaster *node = QSIMPLEQ_FIRST(>pending_masters); -bus->bh = node->bh; - -QSIMPLEQ_REMOVE_HEAD(>pending_masters, entry); -g_free(node); - -qemu_bh_schedule(bus->bh); -} } int i2c_send(I2CBus *bus, uint8_t data) diff --git a/include/hw/i2c/i2c.h b/include/hw/i2c/i2c.h index 9b9581d23097..2a3abacd1ba6 100644 --- a/include/hw/i2c/i2c.h +++ b/include/hw/i2c/i2c.h @@ -141,6 +141,8 @@ int i2c_start_send(I2CBus *bus, uint8_t address); */ int i2c_start_send_async(I2CBus *bus, uint8_t address); +void i2c_schedule_pending_master(I2CBus *bus); + void i2c_end_transfer(I2CBus *bus); void i2c_nack(I2CBus *bus); void i2c_ack(I2CBus *bus);
Re: [PATCH v2] m25p80: Improve error when the backend file size does not match the device
On 11/16/22 09:28, Markus Armbruster wrote: Cédric Le Goater writes: On 11/16/22 07:56, Markus Armbruster wrote: Cédric Le Goater writes: Currently, when a block backend is attached to a m25p80 device and the associated file size does not match the flash model, QEMU complains with the error message "failed to read the initial flash content". This is confusing for the user. Use blk_check_size_and_read_all() instead of blk_pread() to improve the reported error. Signed-off-by: Cédric Le Goater --- hw/block/m25p80.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 02adc87527..68a757abf3 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -24,6 +24,7 @@ #include "qemu/osdep.h" #include "qemu/units.h" #include "sysemu/block-backend.h" +#include "hw/block/block.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" #include "hw/ssi/ssi.h" @@ -1614,8 +1615,7 @@ static void m25p80_realize(SSIPeripheral *ss, Error **errp) trace_m25p80_binding(s); s->storage = blk_blockalign(s->blk, s->size); -if (blk_pread(s->blk, 0, s->size, s->storage, 0) < 0) { -error_setg(errp, "failed to read the initial flash content"); +if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) { return; } } else { Ignorant question: what does blk_pread() on short read? Does it fail? an underlying call to blk_check_byte_request() makes it fail. Thanks! I tried to understand how blk_set_allow_write_beyond_eof() worked but I lack knowledge on the block area. Or does it succeed, returning how much it read? I tried to find an answer in function comments, no luck. Are there more instances of "we fill some fixed-size memory (such as a ROM or flash) from a block backend?" Yes. There are other similar devices : nand, nvram, pnv_pnor, etc. I think they should all be converted to blk_check_size_and_read_all(). Not a prerequisite for getting this patch merged. Volunteers? I can do the ones I introduced for the Aspeed and PPC machines. Or we find a way to allow the underlying backend file to grow when accessed beyond EOF but *not* beyond the flash device size. This might be the complex part. The device model does some checks but the block backend as no idea of the upper layer limitations. Thanks, C.