[PATCH 2/2] ide: Explicitly poll for BHs on cancel
When we still have an AIOCB registered for DMA operations, we try to settle the respective operation by draining the BlockBackend associated with the IDE device. However, this assumes that every DMA operation is associated with some I/O operation on the BlockBackend, and so settling the latter will settle the former. That is not the case; for example, the guest is free to issue a zero-length TRIM operation that will not result in any I/O operation forwarded to the BlockBackend. In such a case, blk_drain() will be a no-op if no other operations are in flight. It is clear that if blk_drain() is a no-op, the value of s->bus->dma->aiocb will not change between checking it in the `if` condition and asserting that it is NULL after blk_drain(). To settle the DMA operation, we will thus need to explicitly invoke aio_poll() ourselves, which will run any outstanding BHs (like ide_trim_bh_cb()), until s->bus->dma->aiocb is NULL. To stop this from being an infinite loop, assert that we made progress with every aio_poll() call (i.e., invoked some BH). Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980 Signed-off-by: Hanna Reitz Signed-off-by: Lukas Straub --- hw/ide/core.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index d172e70f1e..a5fd89ebdd 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -736,7 +736,17 @@ void ide_cancel_dma_sync(IDEState *s) if (s->bus->dma->aiocb) { trace_ide_cancel_dma_sync_remaining(); blk_drain(s->blk); -assert(s->bus->dma->aiocb == NULL); + +/* + * Wait for potentially still-scheduled BHs, like ide_trim_bh_cb() + * (blk_drain() will only poll if there are in-flight requests on the + * BlockBackend, which there may not necessarily be, e.g. when the + * guest has issued a zero-length TRIM request) + */ +while (s->bus->dma->aiocb) { +bool progress = aio_poll(qemu_get_aio_context(), true); +assert(progress); +} } } -- 2.39.2 pgpj6vMoQ4HKf.pgp Description: OpenPGP digital signature
[PATCH 1/2] ide: Fix a rare hang during block draining
If the guest issues a discard during a block drain section, the blk_aio_pdiscard() may not be processed, but queued instead. And so the callback will never be called to issue the bh and decrease the BB in-flight number again. This causes a hang in the drain code, since it will wait forever for the BB in-flight counter to decrease. This reverts commit 7e5cdb34 "ide: Increment BB in-flight counter for TRIM BH" to fix this hang. The bug fixed by that commit will be fixed differently in the next commit. Signed-off-by: Lukas Straub --- hw/ide/core.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index de48ff9f86..d172e70f1e 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -436,16 +436,12 @@ static const AIOCBInfo trim_aiocb_info = { static void ide_trim_bh_cb(void *opaque) { TrimAIOCB *iocb = opaque; -BlockBackend *blk = iocb->s->blk; iocb->common.cb(iocb->common.opaque, iocb->ret); qemu_bh_delete(iocb->bh); iocb->bh = NULL; qemu_aio_unref(iocb); - -/* Paired with an increment in ide_issue_trim() */ -blk_dec_in_flight(blk); } static void ide_issue_trim_cb(void *opaque, int ret) @@ -516,9 +512,6 @@ BlockAIOCB *ide_issue_trim( IDEDevice *dev = s->unit ? s->bus->slave : s->bus->master; TrimAIOCB *iocb; -/* Paired with a decrement in ide_trim_bh_cb() */ -blk_inc_in_flight(s->blk); - iocb = blk_aio_get(_aiocb_info, s->blk, cb, cb_opaque); iocb->s = s; iocb->bh = qemu_bh_new_guarded(ide_trim_bh_cb, iocb, -- 2.39.2 pgpsiHGM3Qy0x.pgp Description: OpenPGP digital signature
Re: [PATCH 17/17] qapi: Reformat doc comments to conform to current conventions
On Fri, 28 Apr 2023 12:54:29 +0200 Markus Armbruster wrote: > Change > > # @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed > #do eiusmod tempor incididunt ut labore et dolore magna aliqua. > > to > > # @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed > # do eiusmod tempor incididunt ut labore et dolore magna aliqua. > > See recent commit "qapi: Relax doc string @name: description > indentation rules" for rationale. > > Reflow paragraphs to 70 columns width, and consistently use two spaces > to separate sentences. > > To check the generated documentation does not change, I compared the > generated HTML before and after this commit with "wdiff -3". Finds no > differences. Comparing with diff is not useful, as the reflown > paragraphs are visible there. > > Signed-off-by: Markus Armbruster Acked-by: Lukas Straub > --- > qapi/acpi.json | 50 +- > qapi/audio.json | 85 +- > qapi/authz.json | 29 +- > qapi/block-core.json | 2801 -- > qapi/block-export.json | 242 ++-- > qapi/block.json | 214 +-- > qapi/char.json | 134 +- > qapi/common.json | 19 +- > qapi/compat.json | 13 +- > qapi/control.json| 59 +- > qapi/crypto.json | 261 ++-- > qapi/cryptodev.json |3 + > qapi/cxl.json| 74 +- > qapi/dump.json | 78 +- > qapi/error.json |6 +- > qapi/introspect.json | 89 +- > qapi/job.json| 139 +- > qapi/machine-target.json | 303 +++-- > qapi/machine.json| 389 +++--- > qapi/migration.json | 1117 --- > qapi/misc-target.json| 67 +- > qapi/misc.json | 180 ++- > qapi/net.json| 260 ++-- > qapi/pci.json| 35 +- > qapi/qapi-schema.json| 25 +- > qapi/qdev.json | 63 +- > qapi/qom.json| 404 +++--- > qapi/rdma.json |1 - > qapi/replay.json | 48 +- > qapi/rocker.json | 20 +- > qapi/run-state.json | 215 +-- > qapi/sockets.json| 50 +- > qapi/stats.json | 83 +- > qapi/tpm.json| 20 +- > qapi/trace.json | 34 +- > qapi/transaction.json| 87 +- > qapi/ui.json | 435 +++--- > qapi/virtio.json | 84 +- > qapi/yank.json | 42 +- > 39 files changed, 4322 insertions(+), 3936 deletions(-) > > [...] > > diff --git a/qapi/yank.json b/qapi/yank.json > index 1639744ada..87ec7cab96 100644 > --- a/qapi/yank.json > +++ b/qapi/yank.json > @@ -9,7 +9,7 @@ > ## > # @YankInstanceType: > # > -# An enumeration of yank instance types. See @YankInstance for more > +# An enumeration of yank instance types. See @YankInstance for more > # information. > # > # Since: 6.0 > @@ -20,8 +20,8 @@ > ## > # @YankInstanceBlockNode: > # > -# Specifies which block graph node to yank. See @YankInstance for more > -# information. > +# Specifies which block graph node to yank. See @YankInstance for > +# more information. > # > # @node-name: the name of the block graph node > # > @@ -33,8 +33,8 @@ > ## > # @YankInstanceChardev: > # > -# Specifies which character device to yank. See @YankInstance for more > -# information. > +# Specifies which character device to yank. See @YankInstance for > +# more information. > # > # @id: the chardev's ID > # > @@ -46,21 +46,18 @@ > ## > # @YankInstance: > # > -# A yank instance can be yanked with the @yank qmp command to recover from a > -# hanging QEMU. > +# A yank instance can be yanked with the @yank qmp command to recover > +# from a hanging QEMU. > # > # Currently implemented yank instances: > # > -# - nbd block device: > -# Yanking it will shut down the connection to the nbd server without > -# attempting to reconnect. > -# - socket chardev: > -# Yanking it will shut down the connected socket. > -# - migration: > -# Yanking it will shut down all migration connections. Unlike > -# @migrate_cancel, it will not notify the migration process, so migration > -# will go into @failed state, instead of @cancelled state. @yank should be > -# used to recover from hangs. > +# - nbd block device: Yanking it will shut down the connection to the > +# nbd server without attempting to reconnect. > +# - socket chardev: Yanking it will shut down the connected socket. > +# - migration: Yanking it will shut down all migration connections. > +# Unlike @migrate_cancel, it will n
Re: [PATCH v3 4/4] configure: add --disable-colo-proxy option
On Thu, 27 Apr 2023 23:29:46 +0300 Vladimir Sementsov-Ogievskiy wrote: > Add option to not build filter-mirror, filter-rewriter and > colo-compare when they are not needed. > > There could be more agile configuration, for example add separate > options for each filter, but that may be done in future on demand. The > aim of this patch is to make possible to disable the whole COLO Proxy > subsystem. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > meson_options.txt | 2 ++ > net/meson.build | 14 ++ > scripts/meson-buildoptions.sh | 3 +++ > stubs/colo-compare.c | 7 +++ > stubs/meson.build | 1 + > 5 files changed, 23 insertions(+), 4 deletions(-) > create mode 100644 stubs/colo-compare.c > > diff --git a/meson_options.txt b/meson_options.txt > index 2471dd02da..b59e7ae342 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -289,6 +289,8 @@ option('live_block_migration', type: 'feature', value: > 'auto', > description: 'block migration in the main migration stream') > option('replication', type: 'feature', value: 'auto', > description: 'replication support') > +option('colo_proxy', type: 'feature', value: 'auto', > + description: 'colo-proxy support') > option('bochs', type: 'feature', value: 'auto', > description: 'bochs image format support') > option('cloop', type: 'feature', value: 'auto', > diff --git a/net/meson.build b/net/meson.build > index 87afca3e93..4cfc850c69 100644 > --- a/net/meson.build > +++ b/net/meson.build > @@ -1,13 +1,9 @@ > softmmu_ss.add(files( >'announce.c', >'checksum.c', > - 'colo-compare.c', > - 'colo.c', >'dump.c', >'eth.c', >'filter-buffer.c', > - 'filter-mirror.c', > - 'filter-rewriter.c', >'filter.c', >'hub.c', >'net-hmp-cmds.c', > @@ -19,6 +15,16 @@ softmmu_ss.add(files( >'util.c', > )) > > +if get_option('replication').allowed() or \ > +get_option('colo_proxy').allowed() > + softmmu_ss.add(files('colo-compare.c')) > + softmmu_ss.add(files('colo.c')) > +endif > + > +if get_option('colo_proxy').allowed() > + softmmu_ss.add(files('filter-mirror.c', 'filter-rewriter.c')) > +endif > + The last discussion didn't really come to a conclusion, but I still think that 'filter-mirror.c' (which also contains filter-redirect) should be left unchanged. > softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c')) > > if have_l2tpv3 > diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh > index d4369a3ad8..036047ce6f 100644 > --- a/scripts/meson-buildoptions.sh > +++ b/scripts/meson-buildoptions.sh > @@ -83,6 +83,7 @@ meson_options_help() { >printf "%s\n" ' capstoneWhether and how to find the capstone > library' >printf "%s\n" ' cloop cloop image format support' >printf "%s\n" ' cocoa Cocoa user interface (macOS only)' > + printf "%s\n" ' colo-proxy colo-proxy support' >printf "%s\n" ' coreaudio CoreAudio sound support' >printf "%s\n" ' crypto-afalgLinux AF_ALG crypto backend driver' >printf "%s\n" ' curlCURL block device driver' > @@ -236,6 +237,8 @@ _meson_option_parse() { > --disable-cloop) printf "%s" -Dcloop=disabled ;; > --enable-cocoa) printf "%s" -Dcocoa=enabled ;; > --disable-cocoa) printf "%s" -Dcocoa=disabled ;; > +--enable-colo-proxy) printf "%s" -Dcolo_proxy=enabled ;; > +--disable-colo-proxy) printf "%s" -Dcolo_proxy=disabled ;; > --enable-coreaudio) printf "%s" -Dcoreaudio=enabled ;; > --disable-coreaudio) printf "%s" -Dcoreaudio=disabled ;; > --enable-coroutine-pool) printf "%s" -Dcoroutine_pool=true ;; > diff --git a/stubs/colo-compare.c b/stubs/colo-compare.c > new file mode 100644 > index 00..ec726665be > --- /dev/null > +++ b/stubs/colo-compare.c > @@ -0,0 +1,7 @@ > +#include "qemu/osdep.h" > +#include "qemu/notify.h" > +#include "net/colo-compare.h" > + > +void colo_compare_cleanup(void) > +{ > +} > diff --git a/stubs/meson.build b/stubs/meson.build > index 8412cad15f..a56645e2f7 100644 > --- a/stubs/meson.build > +++ b/stubs/meson.build > @@ -46,6 +46,7 @@ stub_ss.add(files('target-monitor-defs.c')) > stub_ss.add(files('trace-control.c')) > stub_ss.add(files('uuid.c')) > stub_ss.add(files('colo.c')) > +stub_ss.add(files('colo-compare.c')) > stub_ss.add(files('vmstate.c')) > stub_ss.add(files('vm-stop.c')) > stub_ss.add(files('win32-kbd-hook.c')) -- pgpyQ83V9J3SK.pgp Description: OpenPGP digital signature
Re: [PATCH v3 3/4] build: move COLO under CONFIG_REPLICATION
On Thu, 27 Apr 2023 23:29:45 +0300 Vladimir Sementsov-Ogievskiy wrote: > We don't allow to use x-colo capability when replication is not > configured. So, no reason to build COLO when replication is disabled, > it's unusable in this case. > > Note also that the check in migrate_caps_check() is not the only > restriction: some functions in migration/colo.c will just abort if > called with not defined CONFIG_REPLICATION, for example: > > migration_iteration_finish() >case MIGRATION_STATUS_COLO: >migrate_start_colo_process() >colo_process_checkpoint() >abort() > > It could probably make sense to have possibility to enable COLO without > REPLICATION, but this requires deeper audit of colo & replication code, > which may be done later if needed. > > Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Lukas Straub > --- > hmp-commands.hx| 2 ++ > migration/colo.c | 28 - > migration/meson.build | 6 -- > migration/migration-hmp-cmds.c | 2 ++ > migration/options.c| 17 > qapi/migration.json| 12 +++ > stubs/colo.c | 37 ++ > stubs/meson.build | 1 + > 8 files changed, 62 insertions(+), 43 deletions(-) > create mode 100644 stubs/colo.c > > diff --git a/hmp-commands.hx b/hmp-commands.hx > index bb85ee1d26..fbd0932232 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -1035,6 +1035,7 @@ SRST >migration (or once already in postcopy). > ERST > > +#ifdef CONFIG_REPLICATION > { > .name = "x_colo_lost_heartbeat", > .args_type = "", > @@ -1043,6 +1044,7 @@ ERST >"a failover or takeover is needed.", > .cmd = hmp_x_colo_lost_heartbeat, > }, > +#endif > > SRST > ``x_colo_lost_heartbeat`` > diff --git a/migration/colo.c b/migration/colo.c > index 07bfa21fea..e4af47eeeb 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -26,9 +26,7 @@ > #include "qemu/rcu.h" > #include "migration/failover.h" > #include "migration/ram.h" > -#ifdef CONFIG_REPLICATION > #include "block/replication.h" > -#endif > #include "net/colo-compare.h" > #include "net/colo.h" > #include "block/block.h" > @@ -68,7 +66,6 @@ static bool colo_runstate_is_stopped(void) > static void secondary_vm_do_failover(void) > { > /* COLO needs enable block-replication */ > -#ifdef CONFIG_REPLICATION > int old_state; > MigrationIncomingState *mis = migration_incoming_get_current(); > Error *local_err = NULL; > @@ -133,14 +130,10 @@ static void secondary_vm_do_failover(void) > if (mis->migration_incoming_co) { > qemu_coroutine_enter(mis->migration_incoming_co); > } > -#else > -abort(); > -#endif > } > > static void primary_vm_do_failover(void) > { > -#ifdef CONFIG_REPLICATION > MigrationState *s = migrate_get_current(); > int old_state; > Error *local_err = NULL; > @@ -181,9 +174,6 @@ static void primary_vm_do_failover(void) > > /* Notify COLO thread that failover work is finished */ > qemu_sem_post(>colo_exit_sem); > -#else > -abort(); > -#endif > } > > COLOMode get_colo_mode(void) > @@ -217,7 +207,6 @@ void colo_do_failover(void) > } > } > > -#ifdef CONFIG_REPLICATION > void qmp_xen_set_replication(bool enable, bool primary, > bool has_failover, bool failover, > Error **errp) > @@ -271,7 +260,6 @@ void qmp_xen_colo_do_checkpoint(Error **errp) > /* Notify all filters of all NIC to do checkpoint */ > colo_notify_filters_event(COLO_EVENT_CHECKPOINT, errp); > } > -#endif > > COLOStatus *qmp_query_colo_status(Error **errp) > { > @@ -435,15 +423,11 @@ static int > colo_do_checkpoint_transaction(MigrationState *s, > } > qemu_mutex_lock_iothread(); > > -#ifdef CONFIG_REPLICATION > replication_do_checkpoint_all(_err); > if (local_err) { > qemu_mutex_unlock_iothread(); > goto out; > } > -#else > -abort(); > -#endif > > colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err); > if (local_err) { > @@ -561,15 +545,11 @@ static void colo_process_checkpoint(MigrationState *s) > object_unref(OBJECT(bioc)); > > qemu_mutex_lock_iothread(); > -#ifdef CONFIG
Re: [PATCH v3 1/4] block/meson.build: prefer positive condition for replication
On Thu, 27 Apr 2023 23:29:43 +0300 Vladimir Sementsov-Ogievskiy wrote: > Signed-off-by: Vladimir Sementsov-Ogievskiy > Reviewed-by: Juan Quintela > Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Lukas Straub > --- > block/meson.build | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/meson.build b/block/meson.build > index 382bec0e7d..b9a72e219b 100644 > --- a/block/meson.build > +++ b/block/meson.build > @@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: > files('file-win32.c', 'win32-aio.c') > block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, > iokit]) > block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c')) > block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c')) > -if not get_option('replication').disabled() > +if get_option('replication').allowed() >block_ss.add(files('replication.c')) > endif > block_ss.add(when: libaio, if_true: files('linux-aio.c')) -- pgpFssw8I8Ecv.pgp Description: OpenPGP digital signature
Re: [PATCH v2 4/4] configure: add --disable-colo-filters option
On Thu, 20 Apr 2023 09:09:48 + "Zhang, Chen" wrote: > > -Original Message- > > From: Vladimir Sementsov-Ogievskiy > > Sent: Thursday, April 20, 2023 6:53 AM > > To: qemu-de...@nongnu.org > > Cc: qemu-block@nongnu.org; michael.r...@amd.com; arm...@redhat.com; > > ebl...@redhat.com; jasow...@redhat.com; quint...@redhat.com; Zhang, > > Hailiang ; phi...@linaro.org; > > th...@redhat.com; berra...@redhat.com; marcandre.lur...@redhat.com; > > pbonz...@redhat.com; d...@treblig.org; hre...@redhat.com; > > kw...@redhat.com; Zhang, Chen ; > > lizhij...@fujitsu.com; Vladimir Sementsov-Ogievskiy > team.ru> > > Subject: [PATCH v2 4/4] configure: add --disable-colo-filters option > > > > Add option to not build COLO Proxy subsystem if it is not needed. > > I think no need to add the --disable-colo-filter option. > Net-filters just general infrastructure. Another example is COLO also > use the -chardev socket to connect each filters. No need to add the > --disable-colo-chardev > Please drop this patch. > But for COLO network part, still something need to do: > You can add --disable-colo-proxy not to build the net/colo-compare.c if it > is not needed. > This file just for COLO and not belong to network filters. And net/filter-rewriter.c is just for COLO too. So in summary just drop net/filter-mirror.c from this patch? > > Thanks > Chen > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy > > --- > > meson.build | 1 + > > meson_options.txt | 2 ++ > > net/meson.build | 11 --- > > scripts/meson-buildoptions.sh | 3 +++ > > 4 files changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/meson.build b/meson.build > > index c44d05a13f..5b2fdfbd3a 100644 > > --- a/meson.build > > +++ b/meson.build > > @@ -1962,6 +1962,7 @@ config_host_data.set('CONFIG_GPROF', > > get_option('gprof')) > > config_host_data.set('CONFIG_LIVE_BLOCK_MIGRATION', > > get_option('live_block_migration').allowed()) > > config_host_data.set('CONFIG_QOM_CAST_DEBUG', > > get_option('qom_cast_debug')) > > config_host_data.set('CONFIG_REPLICATION', > > get_option('replication').allowed()) > > +config_host_data.set('CONFIG_COLO_FILTERS', > > +get_option('colo_filters').allowed()) > > > > # has_header > > config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h')) diff > > --git > > a/meson_options.txt b/meson_options.txt index fc9447d267..ffe81317cb > > 100644 > > --- a/meson_options.txt > > +++ b/meson_options.txt > > @@ -289,6 +289,8 @@ option('live_block_migration', type: 'feature', value: > > 'auto', > > description: 'block migration in the main migration stream') > > option('replication', type: 'feature', value: 'auto', > > description: 'replication support') > > +option('colo_filters', type: 'feature', value: 'auto', > > + description: 'colo_filters support') > > option('bochs', type: 'feature', value: 'auto', > > description: 'bochs image format support') option('cloop', type: > > 'feature', > > value: 'auto', diff --git a/net/meson.build b/net/meson.build index > > 38d50b8c96..7e54744aea 100644 > > --- a/net/meson.build > > +++ b/net/meson.build > > @@ -1,12 +1,9 @@ > > softmmu_ss.add(files( > >'announce.c', > >'checksum.c', > > - 'colo.c', > >'dump.c', > >'eth.c', > >'filter-buffer.c', > > - 'filter-mirror.c', > > - 'filter-rewriter.c', > >'filter.c', > >'hub.c', > >'net-hmp-cmds.c', > > @@ -22,6 +19,14 @@ if get_option('replication').allowed() > >softmmu_ss.add(files('colo-compare.c')) > > endif > > > > +if get_option('replication').allowed() or > > +get_option('colo_filters').allowed() > > + softmmu_ss.add(files('colo.c')) > > +endif > > + > > +if get_option('colo_filters').allowed() > > + softmmu_ss.add(files('filter-mirror.c', 'filter-rewriter.c')) endif > > + > > softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c')) > > > > if have_l2tpv3 > > diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh > > index 009fab1515..cf9d23369f 100644 > > --- a/scripts/meson-buildoptions.sh > > +++ b/scripts/meson-buildoptions.sh > > @@ -83,6 +83,7 @@ meson_options_help() { > >printf "%s\n" ' capstoneWhether and how to find the capstone > > library' > >printf "%s\n" ' cloop cloop image format support' > >printf "%s\n" ' cocoa Cocoa user interface (macOS only)' > > + printf "%s\n" ' colo-filterscolo_filters support' > >printf "%s\n" ' coreaudio CoreAudio sound support' > >printf "%s\n" ' crypto-afalgLinux AF_ALG crypto backend driver' > >printf "%s\n" ' curlCURL block device driver' > > @@ -236,6 +237,8 @@ _meson_option_parse() { > > --disable-cloop) printf "%s" -Dcloop=disabled ;; > > --enable-cocoa) printf "%s" -Dcocoa=enabled ;; > > --disable-cocoa) printf "%s" -Dcocoa=disabled ;; > > +--enable-colo-filters) printf "%s"
Re: [PATCH v2 0/4] COLO: improve build options
On Thu, 20 Apr 2023 01:52:28 +0300 Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > COLO substem seems to be useless when CONFIG_REPLICATION is unset, as we > simply don't allow to set x-colo capability in this case. So, let's not > compile in unreachable code and interface we cannot use when > CONFIG_REPLICATION is unset. > > Also, provide personal configure option for COLO Proxy subsystem. > > v1 was > [PATCH] replication: compile out some staff when replication is not configured > Supersedes: <20230411145112.497785-1-vsement...@yandex-team.ru> Hey, This series is a good idea, and looks fine to me. Maybe you can remove the #ifdef CONFIG_REPLICATION/#ifndef CONFIG_REPLICATION from migration/colo.c too while you are at it. Regards, Lukas Straub > Vladimir Sementsov-Ogievskiy (4): > block/meson.build: prefer positive condition for replication > scripts/qapi: allow optional experimental enum values > build: move COLO under CONFIG_REPLICATION > configure: add --disable-colo-filters option > > block/meson.build | 2 +- > hmp-commands.hx| 2 ++ > meson.build| 1 + > meson_options.txt | 2 ++ > migration/colo.c | 6 + > migration/meson.build | 6 +++-- > migration/migration-hmp-cmds.c | 2 ++ > migration/migration.c | 19 +++--- > net/meson.build| 16 +--- > qapi/migration.json| 12 ++--- > scripts/meson-buildoptions.sh | 3 +++ > scripts/qapi/types.py | 2 ++ > stubs/colo.c | 47 ++ > stubs/meson.build | 1 + > 14 files changed, 95 insertions(+), 26 deletions(-) > create mode 100644 stubs/colo.c > -- pgpREDHE483vj.pgp Description: OpenPGP digital signature
[PATCH v6 1/4] replication: Remove s->active_disk
s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 774e15df16..9ad2dfdc69 100644 --- a/block/replication.c +++ b/block/replication.c @@ -35,7 +35,6 @@ typedef enum { typedef struct BDRVReplicationState { ReplicationMode mode; ReplicationStage stage; -BdrvChild *active_disk; BlockJob *commit_job; BdrvChild *hidden_disk; BdrvChild *secondary_disk; @@ -307,8 +306,10 @@ out: return ret; } -static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) +static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) { +BDRVReplicationState *s = bs->opaque; +BdrvChild *active_disk = bs->file; Error *local_err = NULL; int ret; @@ -323,13 +324,13 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -if (!s->active_disk->bs->drv) { +if (!active_disk->bs->drv) { error_setg(errp, "Active disk %s is ejected", - s->active_disk->bs->node_name); + active_disk->bs->node_name); return; } -ret = bdrv_make_empty(s->active_disk, errp); +ret = bdrv_make_empty(active_disk, errp); if (ret < 0) { return; } @@ -458,6 +459,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -495,15 +497,14 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->active_disk->bs->backing) { +active_disk = bs->file; +if (!active_disk || !active_disk->bs || !active_disk->bs->backing) { error_setg(errp, "Active disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->hidden_disk = s->active_disk->bs->backing; +s->hidden_disk = active_disk->bs->backing; if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); @@ -518,7 +519,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* verify the length */ -active_length = bdrv_getlength(s->active_disk->bs); +active_length = bdrv_getlength(active_disk->bs); hidden_length = bdrv_getlength(s->hidden_disk->bs); disk_length = bdrv_getlength(s->secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || @@ -530,9 +531,9 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* Must be true, or the bdrv_getlength() calls would have failed */ -assert(s->active_disk->bs->drv && s->hidden_disk->bs->drv); +assert(active_disk->bs->drv && s->hidden_disk->bs->drv); -if (!s->active_disk->bs->drv->bdrv_make_empty || +if (!active_disk->bs->drv->bdrv_make_empty || !s->hidden_disk->bs->drv->bdrv_make_empty) { error_setg(errp, "Active disk or hidden disk doesn't support make_empty"); @@ -586,7 +587,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, s->stage = BLOCK_REPLICATION_RUNNING; if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } s->error = 0; @@ -615,7 +616,7 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp) } if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } aio_context_release(aio_context); } @@ -652,7 +653,6 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; -s->active_disk = NULL; s->secondary_disk = NULL; s->hidden_disk = NULL; s->error = 0; @@ -705,7 +705,7 @@ static void replication_st
[PATCH v6 2/4] replication: Reduce usage of s->hidden_disk and s->secondary_disk
In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 45 - 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 9ad2dfdc69..25bbdf5d4b 100644 --- a/block/replication.c +++ b/block/replication.c @@ -366,27 +366,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -401,8 +409,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, } } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -459,7 +467,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; -BdrvChild *active_disk; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -504,15 +512,15 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = active_disk->bs->backing; -if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { +hidden_disk = active_disk->bs->backing; +if (!hidden_disk->bs || !hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->secondary_disk = s->hidden_disk->bs->backing; -if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) { +secondary_disk = hidden_disk->bs->backing; +if (!secondary_disk->bs || !bdrv_has_blk(secondary_disk->bs)) { error_setg(errp, "The secondary disk doesn't have block backend"); aio_context_release(aio_context); return; @@ -520,8 +528,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, /* verify the length */ active_length = bdrv_getlength(active_disk->bs); -hidden_length = bdrv_getlength(s->hidden_disk->bs); -disk_length = bdrv_getlength(s->secondary_disk->bs); +hidden_length = bdrv_getlength(hidden_disk->bs); +disk_length = bdrv_getlength(secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || active_length != hidden_length || hidden_length != disk_length) { error_setg(errp, "Active disk, hidden disk, secondary disk's length" @@ -531,10 +539,10 @@ static void replication_start(ReplicationState *rs, Replicat
[PATCH v6 4/4] replication: Remove workaround
Remove the workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/block/replication.c b/block/replication.c index b74192f795..32444b9a8f 100644 --- a/block/replication.c +++ b/block/replication.c @@ -346,17 +346,7 @@ static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, _err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } -- 2.20.1 pgpAQbsrZ72Yx.pgp Description: OpenPGP digital signature
[PATCH v6 3/4] replication: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/block/replication.c b/block/replication.c index 25bbdf5d4b..b74192f795 100644 --- a/block/replication.c +++ b/block/replication.c @@ -165,7 +165,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (role & BDRV_CHILD_PRIMARY) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -557,8 +562,25 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = hidden_disk; -s->secondary_disk = secondary_disk; +bdrv_ref(hidden_disk->bs); +s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk", + _of_bds, BDRV_CHILD_DATA, + _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} + +bdrv_ref(secondary_disk->bs); +s->secondary_disk = bdrv_attach_child(bs, secondary_disk->bs, + "secondary disk", _of_bds, + BDRV_CHILD_DATA, _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} /* start backup job now */ error_setg(>blocker, @@ -664,7 +686,9 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; +bdrv_unref_child(bs, s->secondary_disk); s->secondary_disk = NULL; +bdrv_unref_child(bs, s->hidden_disk); s->hidden_disk = NULL; s->error = 0; } else { -- 2.20.1 pgpWgkEWjLXBe.pgp Description: OpenPGP digital signature
[PATCH v6 0/4] replication: Bugfix and properly attach children
Hello Everyone, A while ago Kevin noticed that the replication driver doesn't properly attach the children it wants to use. Instead, it directly copies the BdrvChilds from it's backing file, which is wrong. Ths Patchset fixes the problem, fixes a potential crash in replication_co_writev due to missing permissions and removes a workaround that was put in place back then. Tested with full COLO-migration setup in my COLO testsuite. Regards, Lukas Straub Changes: -v6: -Drop "replication: Assert that children are writable" -Added Reviewed-by tags -v5: -Assert that children are writable where it's needed -v4: -minor style fixes -clarify why children areguaranteed to be writable in "replication: Remove workaround" -Added Reviewed-by tags -v3: -Split up into multiple patches -Remove s->active_disk -Clarify child permissions in commit message -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since bs->file might not be set yet. (Vladimir) Lukas Straub (4): replication: Remove s->active_disk replication: Reduce usage of s->hidden_disk and s->secondary_disk replication: Properly attach children replication: Remove workaround block/replication.c | 111 +++- 1 file changed, 68 insertions(+), 43 deletions(-) -- 2.20.1 pgpm8dBseNSlI.pgp Description: OpenPGP digital signature
[PATCH v5 5/5] replication: Remove workaround
Remove the workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub --- block/replication.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/block/replication.c b/block/replication.c index 772bb63374..1e9dc4d309 100644 --- a/block/replication.c +++ b/block/replication.c @@ -356,17 +356,7 @@ static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, _err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } -- 2.20.1 pgpwIEmxlcSCT.pgp Description: OpenPGP digital signature
[PATCH v5 0/5] replication: Bugfix and properly attach children
Hello Everyone, A while ago Kevin noticed that the replication driver doesn't properly attach the children it wants to use. Instead, it directly copies the BdrvChilds from it's backing file, which is wrong. Ths Patchset fixes the problem, fixes a potential crash in replication_co_writev due to missing permissions and removes a workaround that was put in place back then. Tested with full COLO-migration setup in my COLO testsuite. Regards, Lukas Straub Changes: -v5: -Assert that children are writable where it's needed -v4: -minor style fixes -clarify why children areguaranteed to be writable in "replication: Remove workaround" -Added Reviewed-by tags -v3: -Split up into multiple patches -Remove s->active_disk -Clarify child permissions in commit message -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since bs->file might not be set yet. (Vladimir) Lukas Straub (5): replication: Remove s->active_disk replication: Reduce usage of s->hidden_disk and s->secondary_disk replication: Properly attach children replication: Assert that children are writable replication: Remove workaround block/replication.c | 121 1 file changed, 78 insertions(+), 43 deletions(-) -- 2.20.1 pgp_th29JAkpO.pgp Description: OpenPGP digital signature
[PATCH v5 3/5] replication: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/block/replication.c b/block/replication.c index 25bbdf5d4b..b74192f795 100644 --- a/block/replication.c +++ b/block/replication.c @@ -165,7 +165,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (role & BDRV_CHILD_PRIMARY) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -557,8 +562,25 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = hidden_disk; -s->secondary_disk = secondary_disk; +bdrv_ref(hidden_disk->bs); +s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk", + _of_bds, BDRV_CHILD_DATA, + _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} + +bdrv_ref(secondary_disk->bs); +s->secondary_disk = bdrv_attach_child(bs, secondary_disk->bs, + "secondary disk", _of_bds, + BDRV_CHILD_DATA, _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} /* start backup job now */ error_setg(>blocker, @@ -664,7 +686,9 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; +bdrv_unref_child(bs, s->secondary_disk); s->secondary_disk = NULL; +bdrv_unref_child(bs, s->hidden_disk); s->hidden_disk = NULL; s->error = 0; } else { -- 2.20.1 pgp5VSvqWfVkG.pgp Description: OpenPGP digital signature
[PATCH v5 1/5] replication: Remove s->active_disk
s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 774e15df16..9ad2dfdc69 100644 --- a/block/replication.c +++ b/block/replication.c @@ -35,7 +35,6 @@ typedef enum { typedef struct BDRVReplicationState { ReplicationMode mode; ReplicationStage stage; -BdrvChild *active_disk; BlockJob *commit_job; BdrvChild *hidden_disk; BdrvChild *secondary_disk; @@ -307,8 +306,10 @@ out: return ret; } -static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) +static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) { +BDRVReplicationState *s = bs->opaque; +BdrvChild *active_disk = bs->file; Error *local_err = NULL; int ret; @@ -323,13 +324,13 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -if (!s->active_disk->bs->drv) { +if (!active_disk->bs->drv) { error_setg(errp, "Active disk %s is ejected", - s->active_disk->bs->node_name); + active_disk->bs->node_name); return; } -ret = bdrv_make_empty(s->active_disk, errp); +ret = bdrv_make_empty(active_disk, errp); if (ret < 0) { return; } @@ -458,6 +459,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -495,15 +497,14 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->active_disk->bs->backing) { +active_disk = bs->file; +if (!active_disk || !active_disk->bs || !active_disk->bs->backing) { error_setg(errp, "Active disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->hidden_disk = s->active_disk->bs->backing; +s->hidden_disk = active_disk->bs->backing; if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); @@ -518,7 +519,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* verify the length */ -active_length = bdrv_getlength(s->active_disk->bs); +active_length = bdrv_getlength(active_disk->bs); hidden_length = bdrv_getlength(s->hidden_disk->bs); disk_length = bdrv_getlength(s->secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || @@ -530,9 +531,9 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* Must be true, or the bdrv_getlength() calls would have failed */ -assert(s->active_disk->bs->drv && s->hidden_disk->bs->drv); +assert(active_disk->bs->drv && s->hidden_disk->bs->drv); -if (!s->active_disk->bs->drv->bdrv_make_empty || +if (!active_disk->bs->drv->bdrv_make_empty || !s->hidden_disk->bs->drv->bdrv_make_empty) { error_setg(errp, "Active disk or hidden disk doesn't support make_empty"); @@ -586,7 +587,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, s->stage = BLOCK_REPLICATION_RUNNING; if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } s->error = 0; @@ -615,7 +616,7 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp) } if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } aio_context_release(aio_context); } @@ -652,7 +653,6 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; -s->active_disk = NULL; s->secondary_disk = NULL; s->hidden_disk = NULL; s->error = 0; @@ -705,7 +705,7 @@ static void replication_st
[PATCH v5 4/5] replication: Assert that children are writable
Assert that the children are writable where it's needed. While there is no test-case for the BLOCK_REPLICATION_FAILOVER_FAILED state, this at least ensures that s->secondary_disk is always writable in case replication might go into that state. Signed-off-by: Lukas Straub --- block/replication.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/block/replication.c b/block/replication.c index b74192f795..772bb63374 100644 --- a/block/replication.c +++ b/block/replication.c @@ -261,6 +261,13 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs, int64_t n; assert(!flags); +assert(top->perm & BLK_PERM_WRITE); +if (s->mode == REPLICATION_MODE_SECONDARY && +s->stage != BLOCK_REPLICATION_NONE && +s->stage != BLOCK_REPLICATION_DONE) { +assert(base->perm & BLK_PERM_WRITE); +} + ret = replication_get_io_status(s); if (ret < 0) { goto out; @@ -318,6 +325,9 @@ static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) Error *local_err = NULL; int ret; +assert(active_disk->perm & BLK_PERM_WRITE); +assert(s->hidden_disk->perm & BLK_PERM_WRITE); + if (!s->backup_job) { error_setg(errp, "Backup job was cancelled unexpectedly"); return; -- 2.20.1 pgpCMs1y2Gkrf.pgp Description: OpenPGP digital signature
[PATCH v5 2/5] replication: Reduce usage of s->hidden_disk and s->secondary_disk
In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 45 - 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 9ad2dfdc69..25bbdf5d4b 100644 --- a/block/replication.c +++ b/block/replication.c @@ -366,27 +366,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -401,8 +409,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, } } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -459,7 +467,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; -BdrvChild *active_disk; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -504,15 +512,15 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = active_disk->bs->backing; -if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { +hidden_disk = active_disk->bs->backing; +if (!hidden_disk->bs || !hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->secondary_disk = s->hidden_disk->bs->backing; -if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) { +secondary_disk = hidden_disk->bs->backing; +if (!secondary_disk->bs || !bdrv_has_blk(secondary_disk->bs)) { error_setg(errp, "The secondary disk doesn't have block backend"); aio_context_release(aio_context); return; @@ -520,8 +528,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, /* verify the length */ active_length = bdrv_getlength(active_disk->bs); -hidden_length = bdrv_getlength(s->hidden_disk->bs); -disk_length = bdrv_getlength(s->secondary_disk->bs); +hidden_length = bdrv_getlength(hidden_disk->bs); +disk_length = bdrv_getlength(secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || active_length != hidden_length || hidden_length != disk_length) { error_setg(errp, "Active disk, hidden disk, secondary disk's length" @@ -531,10 +539,10 @@ static void replication_start(ReplicationState *rs, Replicat
Re: [PATCH v3 4/4] replication: Remove workaround
On Mon, 12 Jul 2021 13:06:19 +0300 Vladimir Sementsov-Ogievskiy wrote: > 11.07.2021 23:33, Lukas Straub wrote: > > On Fri, 9 Jul 2021 10:49:23 +0300 > > Vladimir Sementsov-Ogievskiy wrote: > > > >> 07.07.2021 21:15, Lukas Straub wrote: > >>> Remove the workaround introduced in commit > >>> 6ecbc6c52672db5c13805735ca02784879ce8285 > >>> "replication: Avoid blk_make_empty() on read-only child". > >>> > >>> It is not needed anymore since s->hidden_disk is guaranteed to be > >>> writable when secondary_do_checkpoint() runs. Because replication_start(), > >>> _do_checkpoint() and _stop() are only called by COLO migration code > >>> and COLO-migration doesn't inactivate disks. > >> > >> If look at replication_child_perm() you should also be sure that it always > >> works only with RW disks.. > >> > >> Actually, I think that it would be correct just require BLK_PERM_WRITE in > >> replication_child_perm() unconditionally. Let generic layer care about all > >> these RD/WR things. In _child_perm() we can require WRITE and don't care. > >> If something goes wrong and we can't get WRITE permission we should see > >> clean error-out. > >> > >> Opposite, if we don't require WRITE permission in some case and still do > >> WRITE request, it may crash. > >> > >> Still, this may be considered as a preexisting problem of > >> replication_child_perm() and fixed separately. > > > > Hmm, unconditionally requesting write doesn't work, since qemu on the > > secondary side is started with "-miration incoming", it goes into > > runstate RUN_STATE_INMIGRATE from the beginning and then blockdev_init() > > opens every blockdev with BDRV_O_INACTIVE and then it errors out with > > -drive driver=replication,...: Block node is read-only. > > Ah, OK. So we need this check in _child_perm().. Then, maybe, leave check or > assertion in secondary_do_checkpoint, that hidden_disk is writable? Good Idea. I will add assertions to secondary_do_checkpoint and replication_co_writev too. > > > >>> > >>> Signed-off-by: Lukas Straub > >> > >> So, for this one commit (with probably updated commit message accordingly > >> to my comments, or even rebased on fixed replication_child_perm()): > >> > >> Reviewed-by: Vladimir Sementsov-Ogievskiy > >> > >> > >>> --- > >>>block/replication.c | 12 +--- > >>>1 file changed, 1 insertion(+), 11 deletions(-) > >>> > >>> diff --git a/block/replication.c b/block/replication.c > >>> index c0d4a6c264..68b46d65a8 100644 > >>> --- a/block/replication.c > >>> +++ b/block/replication.c > >>> @@ -348,17 +348,7 @@ static void secondary_do_checkpoint(BlockDriverState > >>> *bs, Error **errp) > >>>return; > >>>} > >>> > >>> -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), > >>> -BLK_PERM_WRITE, BLK_PERM_ALL); > >>> -blk_insert_bs(blk, s->hidden_disk->bs, _err); > >>> -if (local_err) { > >>> -error_propagate(errp, local_err); > >>> -blk_unref(blk); > >>> -return; > >>> -} > >>> - > >>> -ret = blk_make_empty(blk, errp); > >>> -blk_unref(blk); > >>> +ret = bdrv_make_empty(s->hidden_disk, errp); > >>>if (ret < 0) { > >>>return; > >>>} > >>> -- > >>> 2.20.1 > >>> > >> > >> > > > > > > > > -- pgpVkn4sVUUAO.pgp Description: OpenPGP digital signature
[PATCH v4 1/4] replication: Remove s->active_disk
s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 774e15df16..9ad2dfdc69 100644 --- a/block/replication.c +++ b/block/replication.c @@ -35,7 +35,6 @@ typedef enum { typedef struct BDRVReplicationState { ReplicationMode mode; ReplicationStage stage; -BdrvChild *active_disk; BlockJob *commit_job; BdrvChild *hidden_disk; BdrvChild *secondary_disk; @@ -307,8 +306,10 @@ out: return ret; } -static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) +static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) { +BDRVReplicationState *s = bs->opaque; +BdrvChild *active_disk = bs->file; Error *local_err = NULL; int ret; @@ -323,13 +324,13 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -if (!s->active_disk->bs->drv) { +if (!active_disk->bs->drv) { error_setg(errp, "Active disk %s is ejected", - s->active_disk->bs->node_name); + active_disk->bs->node_name); return; } -ret = bdrv_make_empty(s->active_disk, errp); +ret = bdrv_make_empty(active_disk, errp); if (ret < 0) { return; } @@ -458,6 +459,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -495,15 +497,14 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->active_disk->bs->backing) { +active_disk = bs->file; +if (!active_disk || !active_disk->bs || !active_disk->bs->backing) { error_setg(errp, "Active disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->hidden_disk = s->active_disk->bs->backing; +s->hidden_disk = active_disk->bs->backing; if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); @@ -518,7 +519,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* verify the length */ -active_length = bdrv_getlength(s->active_disk->bs); +active_length = bdrv_getlength(active_disk->bs); hidden_length = bdrv_getlength(s->hidden_disk->bs); disk_length = bdrv_getlength(s->secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || @@ -530,9 +531,9 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* Must be true, or the bdrv_getlength() calls would have failed */ -assert(s->active_disk->bs->drv && s->hidden_disk->bs->drv); +assert(active_disk->bs->drv && s->hidden_disk->bs->drv); -if (!s->active_disk->bs->drv->bdrv_make_empty || +if (!active_disk->bs->drv->bdrv_make_empty || !s->hidden_disk->bs->drv->bdrv_make_empty) { error_setg(errp, "Active disk or hidden disk doesn't support make_empty"); @@ -586,7 +587,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, s->stage = BLOCK_REPLICATION_RUNNING; if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } s->error = 0; @@ -615,7 +616,7 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp) } if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } aio_context_release(aio_context); } @@ -652,7 +653,6 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; -s->active_disk = NULL; s->secondary_disk = NULL; s->hidden_disk = NULL; s->error = 0; @@ -705,7 +705,7 @@ static void replication_st
[PATCH v4 0/4] replication: Bugfix and properly attach children
Hello Everyone, A while ago Kevin noticed that the replication driver doesn't properly attach the children it wants to use. Instead, it directly copies the BdrvChilds from it's backing file, which is wrong. Ths Patchset fixes the problem, fixes a potential crash in replication_co_writev due to missing permissions and removes a workaround that was put in place back then. Regards, Lukas Straub Changes: -v4: -minor style fixes -clarify why children areguaranteed to be writable in "replication: Remove workaround" -Added Reviewed-by tags -v3: -Split up into multiple patches -Remove s->active_disk -Clarify child permissions in commit message -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since bs->file might not be set yet. (Vladimir) Lukas Straub (4): replication: Remove s->active_disk replication: Reduce usage of s->hidden_disk and s->secondary_disk replication: Properly attach children replication: Remove workaround block/replication.c | 111 +++- 1 file changed, 68 insertions(+), 43 deletions(-) -- 2.20.1 pgpLHeo2qb2uq.pgp Description: OpenPGP digital signature
[PATCH v4 4/4] replication: Remove workaround
Remove the workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub --- block/replication.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/block/replication.c b/block/replication.c index b74192f795..32444b9a8f 100644 --- a/block/replication.c +++ b/block/replication.c @@ -346,17 +346,7 @@ static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, _err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } -- 2.20.1 pgp7RP2lHvxak.pgp Description: OpenPGP digital signature
[PATCH v4 3/4] replication: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/block/replication.c b/block/replication.c index 25bbdf5d4b..b74192f795 100644 --- a/block/replication.c +++ b/block/replication.c @@ -165,7 +165,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (role & BDRV_CHILD_PRIMARY) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -557,8 +562,25 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = hidden_disk; -s->secondary_disk = secondary_disk; +bdrv_ref(hidden_disk->bs); +s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk", + _of_bds, BDRV_CHILD_DATA, + _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} + +bdrv_ref(secondary_disk->bs); +s->secondary_disk = bdrv_attach_child(bs, secondary_disk->bs, + "secondary disk", _of_bds, + BDRV_CHILD_DATA, _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} /* start backup job now */ error_setg(>blocker, @@ -664,7 +686,9 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; +bdrv_unref_child(bs, s->secondary_disk); s->secondary_disk = NULL; +bdrv_unref_child(bs, s->hidden_disk); s->hidden_disk = NULL; s->error = 0; } else { -- 2.20.1 pgpxilLs24yxr.pgp Description: OpenPGP digital signature
[PATCH v4 2/4] replication: Reduce usage of s->hidden_disk and s->secondary_disk
In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/replication.c | 45 - 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 9ad2dfdc69..25bbdf5d4b 100644 --- a/block/replication.c +++ b/block/replication.c @@ -366,27 +366,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -401,8 +409,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, } } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -459,7 +467,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; -BdrvChild *active_disk; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -504,15 +512,15 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = active_disk->bs->backing; -if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { +hidden_disk = active_disk->bs->backing; +if (!hidden_disk->bs || !hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->secondary_disk = s->hidden_disk->bs->backing; -if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) { +secondary_disk = hidden_disk->bs->backing; +if (!secondary_disk->bs || !bdrv_has_blk(secondary_disk->bs)) { error_setg(errp, "The secondary disk doesn't have block backend"); aio_context_release(aio_context); return; @@ -520,8 +528,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, /* verify the length */ active_length = bdrv_getlength(active_disk->bs); -hidden_length = bdrv_getlength(s->hidden_disk->bs); -disk_length = bdrv_getlength(s->secondary_disk->bs); +hidden_length = bdrv_getlength(hidden_disk->bs); +disk_length = bdrv_getlength(secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || active_length != hidden_length || hidden_length != disk_length) { error_setg(errp, "Active disk, hidden disk, secondary disk's length" @@ -531,10 +539,10 @@ static void replication_start(ReplicationState *rs, Replicat
Re: [PATCH v3 4/4] replication: Remove workaround
On Fri, 9 Jul 2021 10:49:23 +0300 Vladimir Sementsov-Ogievskiy wrote: > 07.07.2021 21:15, Lukas Straub wrote: > > Remove the workaround introduced in commit > > 6ecbc6c52672db5c13805735ca02784879ce8285 > > "replication: Avoid blk_make_empty() on read-only child". > > > > It is not needed anymore since s->hidden_disk is guaranteed to be > > writable when secondary_do_checkpoint() runs. Because replication_start(), > > _do_checkpoint() and _stop() are only called by COLO migration code > > and COLO-migration doesn't inactivate disks. > > If look at replication_child_perm() you should also be sure that it always > works only with RW disks.. > > Actually, I think that it would be correct just require BLK_PERM_WRITE in > replication_child_perm() unconditionally. Let generic layer care about all > these RD/WR things. In _child_perm() we can require WRITE and don't care. If > something goes wrong and we can't get WRITE permission we should see clean > error-out. > > Opposite, if we don't require WRITE permission in some case and still do > WRITE request, it may crash. > > Still, this may be considered as a preexisting problem of > replication_child_perm() and fixed separately. Hmm, unconditionally requesting write doesn't work, since qemu on the secondary side is started with "-miration incoming", it goes into runstate RUN_STATE_INMIGRATE from the beginning and then blockdev_init() opens every blockdev with BDRV_O_INACTIVE and then it errors out with -drive driver=replication,...: Block node is read-only. > > > > Signed-off-by: Lukas Straub > > So, for this one commit (with probably updated commit message accordingly to > my comments, or even rebased on fixed replication_child_perm()): > > Reviewed-by: Vladimir Sementsov-Ogievskiy > > > > --- > > block/replication.c | 12 +--- > > 1 file changed, 1 insertion(+), 11 deletions(-) > > > > diff --git a/block/replication.c b/block/replication.c > > index c0d4a6c264..68b46d65a8 100644 > > --- a/block/replication.c > > +++ b/block/replication.c > > @@ -348,17 +348,7 @@ static void secondary_do_checkpoint(BlockDriverState > > *bs, Error **errp) > > return; > > } > > > > -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), > > -BLK_PERM_WRITE, BLK_PERM_ALL); > > -blk_insert_bs(blk, s->hidden_disk->bs, _err); > > -if (local_err) { > > -error_propagate(errp, local_err); > > -blk_unref(blk); > > -return; > > -} > > - > > -ret = blk_make_empty(blk, errp); > > -blk_unref(blk); > > +ret = bdrv_make_empty(s->hidden_disk, errp); > > if (ret < 0) { > > return; > > } > > -- > > 2.20.1 > > > > -- pgpJHRvFkSClE.pgp Description: OpenPGP digital signature
Re: [PATCH v3 1/4] replication: Remove s->active_disk
On Fri, 9 Jul 2021 10:11:15 +0300 Vladimir Sementsov-Ogievskiy wrote: > 07.07.2021 21:15, Lukas Straub wrote: > > s->active_disk is bs->file. Remove it and use local variables instead. > > > > Signed-off-by: Lukas Straub > > --- > > block/replication.c | 38 +- > > 1 file changed, 21 insertions(+), 17 deletions(-) > > > > diff --git a/block/replication.c b/block/replication.c > > index 52163f2d1f..50940fbe33 100644 > > --- a/block/replication.c > > +++ b/block/replication.c > > @@ -35,7 +35,6 @@ typedef enum { > > typedef struct BDRVReplicationState { > > ReplicationMode mode; > > ReplicationStage stage; > > -BdrvChild *active_disk; > > BlockJob *commit_job; > > BdrvChild *hidden_disk; > > BdrvChild *secondary_disk; > > @@ -307,11 +306,15 @@ out: > > return ret; > > } > > > > -static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) > > +static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) > > { > > +BDRVReplicationState *s = bs->opaque; > > +BdrvChild *active_disk; > > Why not to combine initialization into definition: > > BdrvChild *active_disk = bs->file; Ok, will fix. > > Error *local_err = NULL; > > int ret; > > > > +active_disk = bs->file; > > + > > if (!s->backup_job) { > > error_setg(errp, "Backup job was cancelled unexpectedly"); > > return; > > @@ -323,13 +326,13 @@ static void > > secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) > > return; > > } > > > > -if (!s->active_disk->bs->drv) { > > +if (!active_disk->bs->drv) { > > error_setg(errp, "Active disk %s is ejected", > > - s->active_disk->bs->node_name); > > + active_disk->bs->node_name); > > return; > > } > > > > -ret = bdrv_make_empty(s->active_disk, errp); > > +ret = bdrv_make_empty(active_disk, errp); > > if (ret < 0) { > > return; > > } > > @@ -451,6 +454,7 @@ static void replication_start(ReplicationState *rs, > > ReplicationMode mode, > > BlockDriverState *bs = rs->opaque; > > BDRVReplicationState *s; > > BlockDriverState *top_bs; > > +BdrvChild *active_disk; > > int64_t active_length, hidden_length, disk_length; > > AioContext *aio_context; > > Error *local_err = NULL; > > @@ -488,15 +492,14 @@ static void replication_start(ReplicationState *rs, > > ReplicationMode mode, > > case REPLICATION_MODE_PRIMARY: > > break; > > case REPLICATION_MODE_SECONDARY: > > -s->active_disk = bs->file; > > -if (!s->active_disk || !s->active_disk->bs || > > -!s->active_disk->bs->backing) { > > +active_disk = bs->file; > > Here initializing active_disk only here makes sense: we consider "active > disk" only in secondary mode. Right? Yes. > > +if (!active_disk || !active_disk->bs || !active_disk->bs->backing) > > { > > error_setg(errp, "Active disk doesn't have backing file"); > > aio_context_release(aio_context); > > return; > > } > > > > -s->hidden_disk = s->active_disk->bs->backing; > > +s->hidden_disk = active_disk->bs->backing; > > if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { > > error_setg(errp, "Hidden disk doesn't have backing file"); > > aio_context_release(aio_context); > > @@ -511,7 +514,7 @@ static void replication_start(ReplicationState *rs, > > ReplicationMode mode, > > } > > > > /* verify the length */ > > -active_length = bdrv_getlength(s->active_disk->bs); > > +active_length = bdrv_getlength(active_disk->bs); > > hidden_length = bdrv_getlength(s->hidden_disk->bs); > > disk_length = bdrv_getlength(s->secondary_disk->bs); > > if (active_length < 0 || hidden_length < 0 || disk_length < 0 || > > @@ -523,9 +526,9 @@ static void replication_start(ReplicationState *rs, > > ReplicationMode mode, > >
[PATCH v3 4/4] replication: Remove workaround
Remove the workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration doesn't inactivate disks. Signed-off-by: Lukas Straub --- block/replication.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/block/replication.c b/block/replication.c index c0d4a6c264..68b46d65a8 100644 --- a/block/replication.c +++ b/block/replication.c @@ -348,17 +348,7 @@ static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, _err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } -- 2.20.1 pgpKvTw0eVhsi.pgp Description: OpenPGP digital signature
[PATCH v3 3/4] replication: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. Signed-off-by: Lukas Straub --- block/replication.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/block/replication.c b/block/replication.c index 74adf30f54..c0d4a6c264 100644 --- a/block/replication.c +++ b/block/replication.c @@ -165,7 +165,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (role & BDRV_CHILD_PRIMARY) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -552,8 +557,25 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = hidden_disk; -s->secondary_disk = secondary_disk; +bdrv_ref(hidden_disk->bs); +s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk", + _of_bds, BDRV_CHILD_DATA, + _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} + +bdrv_ref(secondary_disk->bs); +s->secondary_disk = bdrv_attach_child(bs, secondary_disk->bs, + "secondary disk", _of_bds, + BDRV_CHILD_DATA, _err); +if (local_err) { +error_propagate(errp, local_err); +aio_context_release(aio_context); +return; +} /* start backup job now */ error_setg(>blocker, @@ -659,7 +681,9 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; +bdrv_unref_child(bs, s->secondary_disk); s->secondary_disk = NULL; +bdrv_unref_child(bs, s->hidden_disk); s->hidden_disk = NULL; s->error = 0; } else { -- 2.20.1 pgpCxuOT5oetJ.pgp Description: OpenPGP digital signature
[PATCH v3 2/4] replication: Reduce usage of s->hidden_disk and s->secondary_disk
In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub --- block/replication.c | 45 - 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 50940fbe33..74adf30f54 100644 --- a/block/replication.c +++ b/block/replication.c @@ -368,27 +368,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -396,8 +404,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, bdrv_reopen_multiple(reopen_queue, errp); } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -454,7 +462,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; -BdrvChild *active_disk; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -499,15 +507,15 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, return; } -s->hidden_disk = active_disk->bs->backing; -if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { +hidden_disk = active_disk->bs->backing; +if (!hidden_disk->bs || !hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->secondary_disk = s->hidden_disk->bs->backing; -if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) { +secondary_disk = hidden_disk->bs->backing; +if (!secondary_disk->bs || !bdrv_has_blk(secondary_disk->bs)) { error_setg(errp, "The secondary disk doesn't have block backend"); aio_context_release(aio_context); return; @@ -515,8 +523,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, /* verify the length */ active_length = bdrv_getlength(active_disk->bs); -hidden_length = bdrv_getlength(s->hidden_disk->bs); -disk_length = bdrv_getlength(s->secondary_disk->bs); +hidden_length = bdrv_getlength(hidden_disk->bs); +disk_length = bdrv_getlength(secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || active_length != hidden_length || hidden_length != disk_length) { error_setg(errp, "Active disk, hidden disk, secondary disk's length" @@ -526,10 +534,10 @@ static void replication_start(ReplicationState *rs, ReplicationM
[PATCH v3 1/4] replication: Remove s->active_disk
s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub --- block/replication.c | 38 +- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/block/replication.c b/block/replication.c index 52163f2d1f..50940fbe33 100644 --- a/block/replication.c +++ b/block/replication.c @@ -35,7 +35,6 @@ typedef enum { typedef struct BDRVReplicationState { ReplicationMode mode; ReplicationStage stage; -BdrvChild *active_disk; BlockJob *commit_job; BdrvChild *hidden_disk; BdrvChild *secondary_disk; @@ -307,11 +306,15 @@ out: return ret; } -static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) +static void secondary_do_checkpoint(BlockDriverState *bs, Error **errp) { +BDRVReplicationState *s = bs->opaque; +BdrvChild *active_disk; Error *local_err = NULL; int ret; +active_disk = bs->file; + if (!s->backup_job) { error_setg(errp, "Backup job was cancelled unexpectedly"); return; @@ -323,13 +326,13 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -if (!s->active_disk->bs->drv) { +if (!active_disk->bs->drv) { error_setg(errp, "Active disk %s is ejected", - s->active_disk->bs->node_name); + active_disk->bs->node_name); return; } -ret = bdrv_make_empty(s->active_disk, errp); +ret = bdrv_make_empty(active_disk, errp); if (ret < 0) { return; } @@ -451,6 +454,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -488,15 +492,14 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->active_disk->bs->backing) { +active_disk = bs->file; +if (!active_disk || !active_disk->bs || !active_disk->bs->backing) { error_setg(errp, "Active disk doesn't have backing file"); aio_context_release(aio_context); return; } -s->hidden_disk = s->active_disk->bs->backing; +s->hidden_disk = active_disk->bs->backing; if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) { error_setg(errp, "Hidden disk doesn't have backing file"); aio_context_release(aio_context); @@ -511,7 +514,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* verify the length */ -active_length = bdrv_getlength(s->active_disk->bs); +active_length = bdrv_getlength(active_disk->bs); hidden_length = bdrv_getlength(s->hidden_disk->bs); disk_length = bdrv_getlength(s->secondary_disk->bs); if (active_length < 0 || hidden_length < 0 || disk_length < 0 || @@ -523,9 +526,9 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, } /* Must be true, or the bdrv_getlength() calls would have failed */ -assert(s->active_disk->bs->drv && s->hidden_disk->bs->drv); +assert(active_disk->bs->drv && s->hidden_disk->bs->drv); -if (!s->active_disk->bs->drv->bdrv_make_empty || +if (!active_disk->bs->drv->bdrv_make_empty || !s->hidden_disk->bs->drv->bdrv_make_empty) { error_setg(errp, "Active disk or hidden disk doesn't support make_empty"); @@ -579,7 +582,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, s->stage = BLOCK_REPLICATION_RUNNING; if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } s->error = 0; @@ -608,7 +611,7 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp) } if (s->mode == REPLICATION_MODE_SECONDARY) { -secondary_do_checkpoint(s, errp); +secondary_do_checkpoint(bs, errp); } aio_context_release(aio_context); } @@ -645,7 +648,6 @@ static void replication_done(void *opaque, int ret) if (ret == 0) { s->stage = BLOCK_REPLICATION_DONE; -s->active_disk = NULL; s->secondary_disk = NULL
[PATCH v3 0/4] replication: Properly attach children
Hello Everyone, A while ago Kevin noticed that the replication driver doesn't properly attach the children it wants to use. Instead, it directly copies the BdrvChilds from it's backing file, which is wrong. Ths Patchset fixes the problem and removes the workaround that was put in place back then. Regards, Lukas Straub Changes: -v3: -Split up into multiple patches -Remove s->active_disk -Clarify child permissions in commit message -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since bs->file might not be set yet. (Vladimir) Lukas Straub (4): replication: Remove s->active_disk replication: Reduce usage of s->hidden_disk and s->secondary_disk replication: Properly attach children replication: Remove workaround block/replication.c | 115 +++- 1 file changed, 72 insertions(+), 43 deletions(-) -- 2.20.1 pgphmZr8kEpDA.pgp Description: OpenPGP digital signature
Re: [PATCH] block/replication.c: Properly attach children
Hi, Thanks for your review. More below. Btw: There is a overview of the replication design in docs/block-replication.txt On Wed, 7 Jul 2021 16:01:31 +0300 Vladimir Sementsov-Ogievskiy wrote: > 06.07.2021 19:11, Lukas Straub wrote: > > The replication driver needs access to the children block-nodes of > > it's child so it can issue bdrv_make_empty to manage the replication. > > However, it does this by directly copying the BdrvChilds, which is > > wrong. > > > > Fix this by properly attaching the block-nodes with > > bdrv_attach_child(). > > > > Also, remove a workaround introduced in commit > > 6ecbc6c52672db5c13805735ca02784879ce8285 > > "replication: Avoid blk_make_empty() on read-only child". > > > > Signed-off-by: Lukas Straub > > --- > > > > -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since > > bs->file might not be set yet. (Vladimir) > > > > block/replication.c | 94 + > > 1 file changed, 61 insertions(+), 33 deletions(-) > > > > diff --git a/block/replication.c b/block/replication.c > > index 52163f2d1f..fd8cb728a3 100644 > > --- a/block/replication.c > > +++ b/block/replication.c > > @@ -166,7 +166,12 @@ static void replication_child_perm(BlockDriverState > > *bs, BdrvChild *c, > > uint64_t perm, uint64_t shared, > > uint64_t *nperm, uint64_t *nshared) > > { > > -*nperm = BLK_PERM_CONSISTENT_READ; > > +if (role & BDRV_CHILD_PRIMARY) { > > +*nperm = BLK_PERM_CONSISTENT_READ; > > +} else { > > +*nperm = 0; > > +} > > Why you drop READ access for other children? You don't mention it in > commit-msg.. > > Upd: ok now I see that we are not going to read from hidden_disk child, and > that's the only "other" child that worth to mention. > Still, we should be sure that hidden_disk child gets WRITE permission in case > we are going to call bdrv_make_empty on it. The code below that in replication_child_perm() should make sure of that or am i misunderstanding it? Or do you mean that it should always request WRITE regardless of bs->open_flags & BDRV_O_INACTIVE? > > + > > if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == > > BDRV_O_RDWR) { > > *nperm |= BLK_PERM_WRITE; > > } > > @@ -340,17 +345,7 @@ static void > > secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) > > return; > > } > > > > -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), > > -BLK_PERM_WRITE, BLK_PERM_ALL); > > -blk_insert_bs(blk, s->hidden_disk->bs, _err); > > -if (local_err) { > > -error_propagate(errp, local_err); > > -blk_unref(blk); > > -return; > > -} > > - > > -ret = blk_make_empty(blk, errp); > > -blk_unref(blk); > > +ret = bdrv_make_empty(s->hidden_disk, errp); > > So, here you rely on BLK_PERM_WRITE being set in replication_child_perm().. > Probably that's OK, however logic is changed. Shouldn't we always require > write permission in replication_child_perm() for hidden_disk ? > > > if (ret < 0) { > > return; > > } > > @@ -365,27 +360,35 @@ static void reopen_backing_file(BlockDriverState *bs, > > bool writable, > > Error **errp) > > { > > BDRVReplicationState *s = bs->opaque; > > +BdrvChild *hidden_disk, *secondary_disk; > > BlockReopenQueue *reopen_queue = NULL; > > > > +/* > > + * s->hidden_disk and s->secondary_disk may not be set yet, as they > > will > > + * only be set after the children are writable. > > + */ > > +hidden_disk = bs->file->bs->backing; > > +secondary_disk = hidden_disk->bs->backing; > > + > > if (writable) { > > -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); > > -s->orig_secondary_read_only = > > bdrv_is_read_only(s->secondary_disk->bs); > > +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); > > +s->orig_secondary_read_only = > > bdrv_is_read_only(secondary_disk->bs); > > } > > > > -bdrv_subtree_drained_begin(s->hidden_disk->bs); > > -bdrv_subtree_drained_begin(s->secondary_disk->bs)
[PATCH] block/replication.c: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child(). Also, remove a workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". Signed-off-by: Lukas Straub --- -v2: Test for BDRV_CHILD_PRIMARY in replication_child_perm, since bs->file might not be set yet. (Vladimir) block/replication.c | 94 + 1 file changed, 61 insertions(+), 33 deletions(-) diff --git a/block/replication.c b/block/replication.c index 52163f2d1f..fd8cb728a3 100644 --- a/block/replication.c +++ b/block/replication.c @@ -166,7 +166,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (role & BDRV_CHILD_PRIMARY) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -340,17 +345,7 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, _err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } @@ -365,27 +360,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -393,8 +396,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, bdrv_reopen_multiple(reopen_queue, errp); } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -451,6 +454,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -488,32 +492,32 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->a
[PATCH resend] block/replication.c: Properly attach children
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child(). Also, remove a workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". Signed-off-by: Lukas Straub --- Fix CC: email address so the mailing list doesn't reject it. block/replication.c | 94 + 1 file changed, 61 insertions(+), 33 deletions(-) diff --git a/block/replication.c b/block/replication.c index 52163f2d1f..426d2b741a 100644 --- a/block/replication.c +++ b/block/replication.c @@ -166,7 +166,12 @@ static void replication_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { -*nperm = BLK_PERM_CONSISTENT_READ; +if (c == bs->file) { +*nperm = BLK_PERM_CONSISTENT_READ; +} else { +*nperm = 0; +} + if ((bs->open_flags & (BDRV_O_INACTIVE | BDRV_O_RDWR)) == BDRV_O_RDWR) { *nperm |= BLK_PERM_WRITE; } @@ -340,17 +345,7 @@ static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp) return; } -BlockBackend *blk = blk_new(qemu_get_current_aio_context(), -BLK_PERM_WRITE, BLK_PERM_ALL); -blk_insert_bs(blk, s->hidden_disk->bs, _err); -if (local_err) { -error_propagate(errp, local_err); -blk_unref(blk); -return; -} - -ret = blk_make_empty(blk, errp); -blk_unref(blk); +ret = bdrv_make_empty(s->hidden_disk, errp); if (ret < 0) { return; } @@ -365,27 +360,35 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, Error **errp) { BDRVReplicationState *s = bs->opaque; +BdrvChild *hidden_disk, *secondary_disk; BlockReopenQueue *reopen_queue = NULL; +/* + * s->hidden_disk and s->secondary_disk may not be set yet, as they will + * only be set after the children are writable. + */ +hidden_disk = bs->file->bs->backing; +secondary_disk = hidden_disk->bs->backing; + if (writable) { -s->orig_hidden_read_only = bdrv_is_read_only(s->hidden_disk->bs); -s->orig_secondary_read_only = bdrv_is_read_only(s->secondary_disk->bs); +s->orig_hidden_read_only = bdrv_is_read_only(hidden_disk->bs); +s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs); } -bdrv_subtree_drained_begin(s->hidden_disk->bs); -bdrv_subtree_drained_begin(s->secondary_disk->bs); +bdrv_subtree_drained_begin(hidden_disk->bs); +bdrv_subtree_drained_begin(secondary_disk->bs); if (s->orig_hidden_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, hidden_disk->bs, opts, true); } if (s->orig_secondary_read_only) { QDict *opts = qdict_new(); qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable); -reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, +reopen_queue = bdrv_reopen_queue(reopen_queue, secondary_disk->bs, opts, true); } @@ -393,8 +396,8 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable, bdrv_reopen_multiple(reopen_queue, errp); } -bdrv_subtree_drained_end(s->hidden_disk->bs); -bdrv_subtree_drained_end(s->secondary_disk->bs); +bdrv_subtree_drained_end(hidden_disk->bs); +bdrv_subtree_drained_end(secondary_disk->bs); } static void backup_job_cleanup(BlockDriverState *bs) @@ -451,6 +454,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, BlockDriverState *bs = rs->opaque; BDRVReplicationState *s; BlockDriverState *top_bs; +BdrvChild *active_disk, *hidden_disk, *secondary_disk; int64_t active_length, hidden_length, disk_length; AioContext *aio_context; Error *local_err = NULL; @@ -488,32 +492,32 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode, case REPLICATION_MODE_PRIMARY: break; case REPLICATION_MODE_SECONDARY: -s->active_disk = bs->file; -if (!s->active_disk || !s->active_disk->bs || -!s->active_disk->bs->backing) { +active_disk = bs->file; +
[PATCH resend] nbd: register yank function earlier
Although unlikely, qemu might hang in nbd_send_request(). Allow recovery in this case by registering the yank function before calling it. Signed-off-by: Lukas Straub --- Fix CC: email address so the mailing list doesn't reject it. block/nbd.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 601fccc5ba..f6ff1c4fb4 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -369,32 +369,34 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs, s->ioc = nbd_co_establish_connection(s->conn, >info, true, errp); if (!s->ioc) { return -ECONNREFUSED; } +yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, + bs); + ret = nbd_handle_updated_info(s->bs, NULL); if (ret < 0) { /* * We have connected, but must fail for other reasons. * Send NBD_CMD_DISC as a courtesy to the server. */ NBDRequest request = { .type = NBD_CMD_DISC }; nbd_send_request(s->ioc, ); +yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), + nbd_yank, bs); object_unref(OBJECT(s->ioc)); s->ioc = NULL; return ret; } qio_channel_set_blocking(s->ioc, false, NULL); qio_channel_attach_aio_context(s->ioc, bdrv_get_aio_context(bs)); -yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, - bs); - /* successfully connected */ s->state = NBD_CLIENT_CONNECTED; qemu_co_queue_restart_all(>free_sema); return 0; -- 2.32.0 pgpzd_EVcikxB.pgp Description: OpenPGP digital signature
Re: [RFC PATCH] block/io.c: Flush parent for quorum in generic code
On Wed, 12 May 2021 15:49:57 +0800 Zhang Chen wrote: > Fix the issue from this patch: > [PATCH] block: Flush all children in generic code > From 883833e29cb800b4d92b5d4736252f4004885191 > > Quorum driver do not have the primary child. > It will caused guest block flush issue when use quorum and NBD. > The vm guest flushes failed,and then guest filesystem is shutdown. Hi, I think the problem is rather that the quorum driver provides .bdrv_co_flush_to_disk (which predates .bdrv_co_flush) instead of .bdrv_co_flush. Can you try with the following patch instead? diff --git a/block/quorum.c b/block/quorum.c index cfc1436abb..f2c0805000 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -1279,7 +1279,7 @@ static BlockDriver bdrv_quorum = { .bdrv_dirname = quorum_dirname, .bdrv_co_block_status = quorum_co_block_status, -.bdrv_co_flush_to_disk = quorum_co_flush, +.bdrv_co_flush = quorum_co_flush, .bdrv_getlength = quorum_getlength, > Signed-off-by: Zhang Chen > Reported-by: Minghao Yuan > --- > block/io.c | 31 ++- > 1 file changed, 22 insertions(+), 9 deletions(-) > > diff --git a/block/io.c b/block/io.c > index 35b6c56efc..4dc1873cb9 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -2849,6 +2849,13 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > BdrvChild *child; > int current_gen; > int ret = 0; > +bool no_primary_child = false; > + > +/* Quorum drivers do not have the primary child. */ > +if (!primary_child) { > +primary_child = bs->file; > +no_primary_child = true; > +} > > bdrv_inc_in_flight(bs); > > @@ -2886,12 +2893,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > > /* But don't actually force it to the disk with cache=unsafe */ > if (bs->open_flags & BDRV_O_NO_FLUSH) { > -goto flush_children; > +goto flush_data; > } > > /* Check if we really need to flush anything */ > if (bs->flushed_gen == current_gen) { > -goto flush_children; > +goto flush_data; > } > > BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK); > @@ -2938,13 +2945,19 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > /* Now flush the underlying protocol. It will also have BDRV_O_NO_FLUSH > * in the case of cache=unsafe, so there are no useless flushes. > */ > -flush_children: > -ret = 0; > -QLIST_FOREACH(child, >children, next) { > -if (child->perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) { > -int this_child_ret = bdrv_co_flush(child->bs); > -if (!ret) { > -ret = this_child_ret; > +flush_data: > +if (no_primary_child) { > +/* Flush parent */ > +ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0; > +} else { > +/* Flush childrens */ > +ret = 0; > +QLIST_FOREACH(child, >children, next) { > +if (child->perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) { > +int this_child_ret = bdrv_co_flush(child->bs); > +if (!ret) { > +ret = this_child_ret; > +} > } > } > } -- pgpS2ZYi1f_kf.pgp Description: OpenPGP digital signature
Re: [PATCH v14 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
On Tue, 12 Jan 2021 17:20:54 +0100 Markus Armbruster wrote: > Queued. Thanks for persevering! > Great, Thanks! Regards, Lukas Straub -- pgp1c_zCQLpLx.pgp Description: OpenPGP digital signature
[PATCH v14 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
A connecting chardev object has an additional reference by the connecting thread, so if the chardev is still connecting by the end of the test, then the chardev object won't be freed. This in turn means that the yank instance won't be unregistered and when running the next test-case yank_register_instance will abort, because the yank instance is already/still registered. Signed-off-by: Lukas Straub Reviewed-by: Daniel P. Berrangé --- tests/test-char.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test-char.c b/tests/test-char.c index 953e0d1c1f..41a76410d8 100644 --- a/tests/test-char.c +++ b/tests/test-char.c @@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque) g_assert_nonnull(opts); chr1 = qemu_chr_new_from_opts(opts, NULL, _abort); g_assert_nonnull(chr1); +qemu_chr_wait_connected(chr1, _abort); chr2 = qemu_chr_new_from_opts(opts, NULL, _err); g_assert_null(chr2); -- 2.29.2 pgpTmkpz7Nuqd.pgp Description: OpenPGP digital signature
[PATCH v14 1/7] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Markus Armbruster --- MAINTAINERS | 7 ++ include/qemu/yank.h | 97 qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 util/meson.build | 1 + util/yank.c | 207 ++ 7 files changed, 433 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c diff --git a/MAINTAINERS b/MAINTAINERS index 1e7c8f0488..f465a4045a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2716,6 +2716,13 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h +F: qapi/yank.json + COLO Framework M: zhanghailiang S: Maintained diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..5b93c70cbf --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,97 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +#include "qapi/qapi-types-yank.h" + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance: The instance. + * @errp: Error object. + * + * Returns true on success or false if an error occured. + */ +bool yank_register_instance(const YankInstance *instance, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance: The instance. + */ +void yank_unregister_instance(const YankInstance *instance); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: The yank function. + * @opaque: Will be passed to the yank function. + */ +void yank_register_function(const YankInstance *instance, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: func that was passed to yank_register_function. + * @opaque: opaque that was passed to yank_register_function. + */ +void yank_unregister_function(const YankInstance *instance, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); + +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_BLOCK_NODE, \ +.u.block_node.node_name = (the_node_name) }) + +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_CHARDEV, \ +.u.chardev.id = (the_id) }) + +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_MIGRATION }) + +#endif diff --git a/qapi/meson.build b/qapi/meson.build index 0e98146f1f..ab68e7900e 100644 --- a/qapi/meson.build +++ b/qapi/meson.build @@ -47,6 +47,7 @@ qapi_all_modules = [ 'trace', 'transaction', 'ui', + 'yank', ] qapi_storage_daemon_modules = [ diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 0b444b76d2..3441c9a9ae 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -86,6 +86,7 @@ { 'include': 'machine.json' } { 'include': 'machine-target.json' } { 'include': 'replay.json' } +{ 'include': 'yank.json' } { 'include': 'misc.json' } { 'include': 'misc-target.json' } { 'include': 'audio.json' } diff --git a/qapi/yank.json b/qapi/yank.json new file mode 100644 index 00..167a775594 --- /dev/null +++ b/qa
[PATCH v14 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
Migration and yank code assume that qio_channel_shutdown is thread -safe and can be called from qmp oob handler. Document this after checking the code. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- include/io/channel.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/io/channel.h b/include/io/channel.h index 4d6fe45f63..ab9ea77959 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -92,7 +92,8 @@ struct QIOChannel { * provide additional optional features. * * Consult the corresponding public API docs for a description - * of the semantics of each callback + * of the semantics of each callback. io_shutdown in particular + * must be thread-safe, terminate quickly and must not block. */ struct QIOChannelClass { ObjectClass parent; @@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc, * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling * this method. * + * This function is thread-safe, terminates quickly and does not block. + * * Returns: 0 on success, -1 on error */ int qio_channel_shutdown(QIOChannel *ioc, -- 2.29.2 pgpR98wwisbJU.pgp Description: OpenPGP digital signature
[PATCH v14 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
Hello Everyone, So here is v14. Changes: v14: -fix checkpatch.pl warning v13: -Address Marc-André Lureau comments: -make yank_register_instance return bool -rename yank_compare_instances to yank_instance_equal -remove breaks -use g_str_equal instead of strcmp -use g_new0 instead of g_slice_new -use QEMU_LOCK_GUARD instead of qemu_mutex_lock/unlock v12: -rebase onto master -minor change to migration (removal of "defer" branch in qemu_start_incoming_migration) -add Reviewed-by tags v11: -squashed MAINTAINERS update into patch 1 -move qmp doc of yank before misc -add title for qmp docs -change "Since:" to 6.0 -add Reviewed-by tags v10: -moved from qapi/misc.json to qapi/yank.json -rename 'blockdev' -> 'block-node' -document difference betwen migration yank instance and migrate_cancel -better document return values of yank command -better document yank_lock -minor style and spelling fixes v9: -rebase onto master -implemented new qmp api as proposed by Markus v8: -add Reviewed-by and Acked-by tags -rebase onto master -minor change to migration -convert to meson -change "Since:" to 5.2 -varios code style fixes (Markus Armbruster) -point to oob restrictions in comment to yank_register_function (Markus Armbruster) -improve qmp documentation (Markus Armbruster) -document oob suitability of qio_channel and io_shutdown (Markus Armbruster) v7: -yank_register_instance now returns error via Error **errp instead of aborting -dropped "chardev/char.c: Check for duplicate id before creating chardev" v6: -add Reviewed-by and Acked-by tags -rebase on master -lots of changes in nbd due to rebase -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) -fix a crash discovered by the newly added chardev test -fix the test itself v5: -move yank.c to util/ -move yank.h to include/qemu/ -add license to yank.h -use const char* -nbd: use atomic_store_release and atomic_load_aqcuire -io-channel: ensure thread-safety and document it -add myself as maintainer for yank v4: -fix build errors... v3: -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) -fix build errors -rewrite migration patch so it actually passes all tests v2: -don't touch io/ code anymore -always register yank functions -'yank' now takes a list of instances to yank -'query-yank' returns a list of yankable instances Overview: Hello Everyone, In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) to some other server and that server dies or hangs, qemu hangs too. These patches introduce the new 'yank' out-of-band qmp command to recover from these kinds of hangs. The different subsystems register callbacks which get executed with the yank command. For example the callback can shutdown() a socket. This is intended for the colo use-case, but it can be used for other things too of course. Regards, Lukas Straub Lukas Straub (7): Introduce yank feature block/nbd.c: Add yank feature chardev/char-socket.c: Add yank feature migration: Add yank feature io/channel-tls.c: make qio_channel_tls_shutdown thread-safe io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test MAINTAINERS | 7 ++ block/nbd.c | 153 +++-- chardev/char-socket.c | 34 ++ include/io/channel.h | 5 +- include/qemu/yank.h | 97 io/channel-tls.c | 6 +- migration/channel.c | 13 +++ migration/migration.c | 22 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 ++ migration/savevm.c| 5 + qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 +++ tests/test-char.c | 1 + util/meson.build | 1 + util/yank.c | 207 ++ 17 files changed, 625 insertions(+), 64 deletions(-) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c -- 2.29.2 pgpRGSQfcEZb6.pgp Description: OpenPGP digital signature
[PATCH v14 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 388f019977..2ae1b92fc0 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (qatomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +qatomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.29.2 pgp5KOsFpXQ5r.pgp Description: OpenPGP digital signature
[PATCH v14 3/7] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- chardev/char-socket.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 213a4c8dd0..8a707d766c 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" #include "qom/object.h" @@ -70,6 +71,7 @@ struct SocketChardev { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +bool registered_yank; SocketAddress *addr; bool is_listen; @@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -932,6 +940,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -946,6 +957,9 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -961,6 +975,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -976,6 +993,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1086,6 +1106,9 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->registered_yank) { +yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label)); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1101,6 +1124,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1134,6 +1160,9 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1376,6 +1405,11 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +if (!yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp)) { +return; +} +s->registered_yank = true; + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.29.2 pgpyamnb7S8ue.pgp Description: OpenPGP digital signature
[PATCH v14 4/7] migration: Add yank feature
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Acked-by: Dr. David Alan Gilbert --- migration/channel.c | 13 + migration/migration.c | 22 ++ migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 +++ migration/savevm.c| 5 + 5 files changed, 57 insertions(+) diff --git a/migration/channel.c b/migration/channel.c index 8a783baa0b..35fe234e9c 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -18,6 +18,8 @@ #include "trace.h" #include "qapi/error.h" #include "io/channel-tls.h" +#include "io/channel-socket.h" +#include "qemu/yank.h" /** * @migration_channel_process_incoming - Create new incoming migration channel @@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), @@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s, ioc, object_get_typename(OBJECT(ioc)), hostname, error); if (!error) { +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, + yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), diff --git a/migration/migration.c b/migration/migration.c index e0dbde4091..92f7cb70b2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -56,6 +56,7 @@ #include "net/announce.h" #include "qemu/queue.h" #include "multifd.h" +#include "qemu/yank.h" #ifdef CONFIG_VFIO #include "hw/vfio/vfio-common.h" @@ -254,6 +255,8 @@ void migration_incoming_state_destroy(void) qapi_free_SocketAddressList(mis->socket_address_list); mis->socket_address_list = NULL; } + +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_generate_event(int new_state) @@ -418,6 +421,10 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { +return; +} + qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || @@ -432,6 +439,7 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", )) { fd_start_incoming_migration(p, errp); } else { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -1737,6 +1745,7 @@ static void migrate_fd_cleanup(MigrationState *s) } notifier_list_notify(_state_notifiers, s); block_cleanup_parameters(s); +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_fd_cleanup_schedule(MigrationState *s) @@ -2011,6 +2020,7 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ +yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } @@ -2148,6 +2158,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (!(has_resume && resume)) { +if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { +return; +} +} + if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || strstart(uri, "vsock:", NULL)) { @@ -2161,6 +2177,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", )) { fd_start_outgoing_migration(s, p, _err); } else { +if (!(has_resume && resume)) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); +} error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid migration protocol"); migrate_set_state(>state, MIGRATION_STATUS_SETUP, @@ -2170,6 +2189,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } if (local_err) { +if (!(has
[PATCH v14 2/7] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- block/nbd.c | 153 +++- 1 file changed, 92 insertions(+), 61 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 42536702b6..0f8d17db6a 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -44,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -141,14 +144,13 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, - Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs, bool detach); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -166,12 +168,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -204,7 +206,7 @@ static void reconnect_delay_timer_cb(void *opaque) { BDRVNBDState *s = opaque; -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; while (qemu_co_enter_next(>free_sema, NULL)) { /* Resume all queued requests */ @@ -216,7 +218,7 @@ static void reconnect_delay_timer_cb(void *opaque) static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) { -if (s->state != NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) { return; } @@ -261,7 +263,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -287,7 +289,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) reconnect_delay_timer_del(s); -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; qemu_co_queue_restart_all(>free_sema); } @@ -338,13 +340,14 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = qatomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static void connect_bh(void *opaque) @@ -424,12 +427,12 @@ static void *connect_thread_func(void *opaque) return NULL; } -static QIOChannelSocket *coroutine_fn +static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Err
[PATCH v13 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
Migration and yank code assume that qio_channel_shutdown is thread -safe and can be called from qmp oob handler. Document this after checking the code. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- include/io/channel.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/io/channel.h b/include/io/channel.h index 4d6fe45f63..ab9ea77959 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -92,7 +92,8 @@ struct QIOChannel { * provide additional optional features. * * Consult the corresponding public API docs for a description - * of the semantics of each callback + * of the semantics of each callback. io_shutdown in particular + * must be thread-safe, terminate quickly and must not block. */ struct QIOChannelClass { ObjectClass parent; @@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc, * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling * this method. * + * This function is thread-safe, terminates quickly and does not block. + * * Returns: 0 on success, -1 on error */ int qio_channel_shutdown(QIOChannel *ioc, -- 2.29.2 pgpk0UhyEYFYF.pgp Description: OpenPGP digital signature
[PATCH v13 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
A connecting chardev object has an additional reference by the connecting thread, so if the chardev is still connecting by the end of the test, then the chardev object won't be freed. This in turn means that the yank instance won't be unregistered and when running the next test-case yank_register_instance will abort, because the yank instance is already/still registered. Signed-off-by: Lukas Straub Reviewed-by: Daniel P. Berrangé --- tests/test-char.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test-char.c b/tests/test-char.c index 953e0d1c1f..41a76410d8 100644 --- a/tests/test-char.c +++ b/tests/test-char.c @@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque) g_assert_nonnull(opts); chr1 = qemu_chr_new_from_opts(opts, NULL, _abort); g_assert_nonnull(chr1); +qemu_chr_wait_connected(chr1, _abort); chr2 = qemu_chr_new_from_opts(opts, NULL, _err); g_assert_null(chr2); -- 2.29.2 pgpJuVgXKz72n.pgp Description: OpenPGP digital signature
[PATCH v13 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 388f019977..2ae1b92fc0 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (qatomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +qatomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.29.2 pgpLGWLqBoSl5.pgp Description: OpenPGP digital signature
[PATCH v13 2/7] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- block/nbd.c | 153 +++- 1 file changed, 92 insertions(+), 61 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 42536702b6..0f8d17db6a 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -44,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -141,14 +144,13 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, - Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs, bool detach); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -166,12 +168,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -204,7 +206,7 @@ static void reconnect_delay_timer_cb(void *opaque) { BDRVNBDState *s = opaque; -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; while (qemu_co_enter_next(>free_sema, NULL)) { /* Resume all queued requests */ @@ -216,7 +218,7 @@ static void reconnect_delay_timer_cb(void *opaque) static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) { -if (s->state != NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) { return; } @@ -261,7 +263,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -287,7 +289,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) reconnect_delay_timer_del(s); -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; qemu_co_queue_restart_all(>free_sema); } @@ -338,13 +340,14 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = qatomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static void connect_bh(void *opaque) @@ -424,12 +427,12 @@ static void *connect_thread_func(void *opaque) return NULL; } -static QIOChannelSocket *coroutine_fn +static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Err
[PATCH v13 4/7] migration: Add yank feature
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Acked-by: Dr. David Alan Gilbert --- migration/channel.c | 13 + migration/migration.c | 22 ++ migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 +++ migration/savevm.c| 5 + 5 files changed, 57 insertions(+) diff --git a/migration/channel.c b/migration/channel.c index 8a783baa0b..35fe234e9c 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -18,6 +18,8 @@ #include "trace.h" #include "qapi/error.h" #include "io/channel-tls.h" +#include "io/channel-socket.h" +#include "qemu/yank.h" /** * @migration_channel_process_incoming - Create new incoming migration channel @@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), @@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s, ioc, object_get_typename(OBJECT(ioc)), hostname, error); if (!error) { +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, + yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), diff --git a/migration/migration.c b/migration/migration.c index e0dbde4091..92f7cb70b2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -56,6 +56,7 @@ #include "net/announce.h" #include "qemu/queue.h" #include "multifd.h" +#include "qemu/yank.h" #ifdef CONFIG_VFIO #include "hw/vfio/vfio-common.h" @@ -254,6 +255,8 @@ void migration_incoming_state_destroy(void) qapi_free_SocketAddressList(mis->socket_address_list); mis->socket_address_list = NULL; } + +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_generate_event(int new_state) @@ -418,6 +421,10 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { +return; +} + qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || @@ -432,6 +439,7 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", )) { fd_start_incoming_migration(p, errp); } else { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -1737,6 +1745,7 @@ static void migrate_fd_cleanup(MigrationState *s) } notifier_list_notify(_state_notifiers, s); block_cleanup_parameters(s); +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_fd_cleanup_schedule(MigrationState *s) @@ -2011,6 +2020,7 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ +yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } @@ -2148,6 +2158,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (!(has_resume && resume)) { +if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { +return; +} +} + if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || strstart(uri, "vsock:", NULL)) { @@ -2161,6 +2177,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", )) { fd_start_outgoing_migration(s, p, _err); } else { +if (!(has_resume && resume)) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); +} error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid migration protocol"); migrate_set_state(>state, MIGRATION_STATUS_SETUP, @@ -2170,6 +2189,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } if (local_err) { +if (!(has
[PATCH v13 3/7] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- chardev/char-socket.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 213a4c8dd0..8a707d766c 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" #include "qom/object.h" @@ -70,6 +71,7 @@ struct SocketChardev { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +bool registered_yank; SocketAddress *addr; bool is_listen; @@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -932,6 +940,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -946,6 +957,9 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -961,6 +975,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -976,6 +993,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1086,6 +1106,9 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->registered_yank) { +yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label)); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1101,6 +1124,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1134,6 +1160,9 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1376,6 +1405,11 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +if (!yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp)) { +return; +} +s->registered_yank = true; + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.29.2 pgpu0zc2CiSD2.pgp Description: OpenPGP digital signature
[PATCH v13 1/7] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Markus Armbruster --- MAINTAINERS | 7 ++ include/qemu/yank.h | 97 qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 util/meson.build | 1 + util/yank.c | 206 ++ 7 files changed, 432 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c diff --git a/MAINTAINERS b/MAINTAINERS index 1e7c8f0488..f465a4045a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2716,6 +2716,13 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h +F: qapi/yank.json + COLO Framework M: zhanghailiang S: Maintained diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..5b93c70cbf --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,97 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +#include "qapi/qapi-types-yank.h" + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance: The instance. + * @errp: Error object. + * + * Returns true on success or false if an error occured. + */ +bool yank_register_instance(const YankInstance *instance, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance: The instance. + */ +void yank_unregister_instance(const YankInstance *instance); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: The yank function. + * @opaque: Will be passed to the yank function. + */ +void yank_register_function(const YankInstance *instance, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: func that was passed to yank_register_function. + * @opaque: opaque that was passed to yank_register_function. + */ +void yank_unregister_function(const YankInstance *instance, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); + +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_BLOCK_NODE, \ +.u.block_node.node_name = (the_node_name) }) + +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_CHARDEV, \ +.u.chardev.id = (the_id) }) + +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_MIGRATION }) + +#endif diff --git a/qapi/meson.build b/qapi/meson.build index 0e98146f1f..ab68e7900e 100644 --- a/qapi/meson.build +++ b/qapi/meson.build @@ -47,6 +47,7 @@ qapi_all_modules = [ 'trace', 'transaction', 'ui', + 'yank', ] qapi_storage_daemon_modules = [ diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 0b444b76d2..3441c9a9ae 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -86,6 +86,7 @@ { 'include': 'machine.json' } { 'include': 'machine-target.json' } { 'include': 'replay.json' } +{ 'include': 'yank.json' } { 'include': 'misc.json' } { 'include': 'misc-target.json' } { 'include': 'audio.json' } diff --git a/qapi/yank.json b/qapi/yank.json new file mode 100644 index 00..167a775594 --- /dev/null +++ b/qa
[PATCH v13 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
Hello Everyone, So here is v13. Changes: v13: -Address Marc-André Lureau comments: -make yank_register_instance return bool -rename yank_compare_instances to yank_instance_equal -remove breaks -use g_str_equal instead of strcmp -use g_new0 instead of g_slice_new -use QEMU_LOCK_GUARD instead of qemu_mutex_lock/unlock v12: -rebase onto master -minor change to migration (removal of "defer" branch in qemu_start_incoming_migration) -add Reviewed-by tags v11: -squashed MAINTAINERS update into patch 1 -move qmp doc of yank before misc -add title for qmp docs -change "Since:" to 6.0 -add Reviewed-by tags v10: -moved from qapi/misc.json to qapi/yank.json -rename 'blockdev' -> 'block-node' -document difference betwen migration yank instance and migrate_cancel -better document return values of yank command -better document yank_lock -minor style and spelling fixes v9: -rebase onto master -implemented new qmp api as proposed by Markus v8: -add Reviewed-by and Acked-by tags -rebase onto master -minor change to migration -convert to meson -change "Since:" to 5.2 -varios code style fixes (Markus Armbruster) -point to oob restrictions in comment to yank_register_function (Markus Armbruster) -improve qmp documentation (Markus Armbruster) -document oob suitability of qio_channel and io_shutdown (Markus Armbruster) v7: -yank_register_instance now returns error via Error **errp instead of aborting -dropped "chardev/char.c: Check for duplicate id before creating chardev" v6: -add Reviewed-by and Acked-by tags -rebase on master -lots of changes in nbd due to rebase -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) -fix a crash discovered by the newly added chardev test -fix the test itself v5: -move yank.c to util/ -move yank.h to include/qemu/ -add license to yank.h -use const char* -nbd: use atomic_store_release and atomic_load_aqcuire -io-channel: ensure thread-safety and document it -add myself as maintainer for yank v4: -fix build errors... v3: -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) -fix build errors -rewrite migration patch so it actually passes all tests v2: -don't touch io/ code anymore -always register yank functions -'yank' now takes a list of instances to yank -'query-yank' returns a list of yankable instances Overview: Hello Everyone, In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) to some other server and that server dies or hangs, qemu hangs too. These patches introduce the new 'yank' out-of-band qmp command to recover from these kinds of hangs. The different subsystems register callbacks which get executed with the yank command. For example the callback can shutdown() a socket. This is intended for the colo use-case, but it can be used for other things too of course. Regards, Lukas Straub Lukas Straub (7): Introduce yank feature block/nbd.c: Add yank feature chardev/char-socket.c: Add yank feature migration: Add yank feature io/channel-tls.c: make qio_channel_tls_shutdown thread-safe io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test MAINTAINERS | 7 ++ block/nbd.c | 153 +++-- chardev/char-socket.c | 34 ++ include/io/channel.h | 5 +- include/qemu/yank.h | 97 io/channel-tls.c | 6 +- migration/channel.c | 13 +++ migration/migration.c | 22 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 ++ migration/savevm.c| 5 + qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 tests/test-char.c | 1 + util/meson.build | 1 + util/yank.c | 206 ++ 17 files changed, 624 insertions(+), 64 deletions(-) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c -- 2.29.2 pgp_q0vxlz9_B.pgp Description: OpenPGP digital signature
Re: [PATCH v12 1/7] Introduce yank feature
On Tue, 22 Dec 2020 12:00:29 +0400 Marc-André Lureau wrote: > On Sun, Dec 13, 2020 at 3:48 PM Lukas Straub wrote: > > > The yank feature allows to recover from hanging qemu by "yanking" > > at various parts. Other qemu systems can register themselves and > > multiple yank functions. Then all yank functions for selected > > instances can be called by the 'yank' out-of-band qmp command. > > Available instances can be queried by a 'query-yank' oob command. > > > > Signed-off-by: Lukas Straub > > Acked-by: Stefan Hajnoczi > > Reviewed-by: Markus Armbruster > > --- > > MAINTAINERS | 7 ++ > > include/qemu/yank.h | 95 +++ > > qapi/meson.build | 1 + > > qapi/qapi-schema.json | 1 + > > qapi/yank.json| 119 +++ > > util/meson.build | 1 + > > util/yank.c | 216 ++ > > 7 files changed, 440 insertions(+) > > create mode 100644 include/qemu/yank.h > > create mode 100644 qapi/yank.json > > create mode 100644 util/yank.c > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index d48a4e8a8b..5d7e3c0e4b 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -2705,6 +2705,13 @@ F: util/uuid.c > > F: include/qemu/uuid.h > > F: tests/test-uuid.c > > > > +Yank feature > > +M: Lukas Straub > > +S: Odd fixes > > +F: util/yank.c > > +F: include/qemu/yank.h > > +F: qapi/yank.json > > + > > COLO Framework > > M: zhanghailiang > > S: Maintained > > diff --git a/include/qemu/yank.h b/include/qemu/yank.h > > new file mode 100644 > > index 00..96f5b2626f > > --- /dev/null > > +++ b/include/qemu/yank.h > > @@ -0,0 +1,95 @@ > > +/* > > + * QEMU yank feature > > + * > > + * Copyright (c) Lukas Straub > > + * > > + * This work is licensed under the terms of the GNU GPL, version 2 or > > later. > > + * See the COPYING file in the top-level directory. > > + */ > > + > > +#ifndef YANK_H > > +#define YANK_H > > + > > +#include "qapi/qapi-types-yank.h" > > + > > +typedef void (YankFn)(void *opaque); > > + > > +/** > > + * yank_register_instance: Register a new instance. > > + * > > + * This registers a new instance for yanking. Must be called before any > > yank > > + * function is registered for this instance. > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + * @errp: Error object. > > + */ > > +void yank_register_instance(const YankInstance *instance, Error **errp); > > + > > > > It's a good idea to return a success boolean. (see include/qapi/error.h) Changed for the next version. > +/** > > + * yank_unregister_instance: Unregister a instance. > > + * > > + * This unregisters a instance. Must be called only after every yank > > function > > + * of the instance has been unregistered. > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + */ > > +void yank_unregister_instance(const YankInstance *instance); > > + > > +/** > > + * yank_register_function: Register a yank function > > + * > > + * This registers a yank function. All limitations of qmp oob commands > > apply > > + * to the yank function as well. See docs/devel/qapi-code-gen.txt under > > + * "An OOB-capable command handler must satisfy the following conditions". > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + * @func: The yank function. > > + * @opaque: Will be passed to the yank function. > > + */ > > +void yank_register_function(const YankInstance *instance, > > +YankFn *func, > > +void *opaque); > > + > > +/** > > + * yank_unregister_function: Unregister a yank function > > + * > > + * This unregisters a yank function. > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + * @func: func that was passed to yank_register_function. > > + * @opaque: opaque that was passed to yank_register_function. > > + */ > > +void yank_unregister_function(const YankInstance *instance, > > + YankFn *func, > > + void *opaque); > > + > > +/** > > + * yank_generic_iochannel: Generic ya
[PATCH v12 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 388f019977..2ae1b92fc0 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (qatomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +qatomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.20.1 pgp_lZLTBABRj.pgp Description: OpenPGP digital signature
[PATCH v12 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
A connecting chardev object has an additional reference by the connecting thread, so if the chardev is still connecting by the end of the test, then the chardev object won't be freed. This in turn means that the yank instance won't be unregistered and when running the next test-case yank_register_instance will abort, because the yank instance is already/still registered. Signed-off-by: Lukas Straub Reviewed-by: Daniel P. Berrangé --- tests/test-char.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test-char.c b/tests/test-char.c index 953e0d1c1f..41a76410d8 100644 --- a/tests/test-char.c +++ b/tests/test-char.c @@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque) g_assert_nonnull(opts); chr1 = qemu_chr_new_from_opts(opts, NULL, _abort); g_assert_nonnull(chr1); +qemu_chr_wait_connected(chr1, _abort); chr2 = qemu_chr_new_from_opts(opts, NULL, _err); g_assert_null(chr2); -- 2.20.1 pgpswu6HAepUZ.pgp Description: OpenPGP digital signature
[PATCH v12 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
Migration and yank code assume that qio_channel_shutdown is thread -safe and can be called from qmp oob handler. Document this after checking the code. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- include/io/channel.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/io/channel.h b/include/io/channel.h index 4d6fe45f63..ab9ea77959 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -92,7 +92,8 @@ struct QIOChannel { * provide additional optional features. * * Consult the corresponding public API docs for a description - * of the semantics of each callback + * of the semantics of each callback. io_shutdown in particular + * must be thread-safe, terminate quickly and must not block. */ struct QIOChannelClass { ObjectClass parent; @@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc, * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling * this method. * + * This function is thread-safe, terminates quickly and does not block. + * * Returns: 0 on success, -1 on error */ int qio_channel_shutdown(QIOChannel *ioc, -- 2.20.1 pgpw1CygrcATs.pgp Description: OpenPGP digital signature
[PATCH v12 3/7] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- chardev/char-socket.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 213a4c8dd0..7f2ee9a338 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" #include "qom/object.h" @@ -70,6 +71,7 @@ struct SocketChardev { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +bool registered_yank; SocketAddress *addr; bool is_listen; @@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -932,6 +940,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -946,6 +957,9 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -961,6 +975,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -976,6 +993,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1086,6 +1106,9 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->registered_yank) { +yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label)); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1101,6 +1124,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1134,6 +1160,9 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1376,6 +1405,12 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp); +if (*errp) { +return; +} +s->registered_yank = true; + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.20.1 pgpgT3srzXmI9.pgp Description: OpenPGP digital signature
[PATCH v12 4/7] migration: Add yank feature
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Acked-by: Dr. David Alan Gilbert --- migration/channel.c | 13 + migration/migration.c | 24 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 +++ migration/savevm.c| 6 ++ 5 files changed, 60 insertions(+) diff --git a/migration/channel.c b/migration/channel.c index 8a783baa0b..35fe234e9c 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -18,6 +18,8 @@ #include "trace.h" #include "qapi/error.h" #include "io/channel-tls.h" +#include "io/channel-socket.h" +#include "qemu/yank.h" /** * @migration_channel_process_incoming - Create new incoming migration channel @@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), @@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s, ioc, object_get_typename(OBJECT(ioc)), hostname, error); if (!error) { +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, + yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), diff --git a/migration/migration.c b/migration/migration.c index e0dbde4091..dc520a721b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -56,6 +56,7 @@ #include "net/announce.h" #include "qemu/queue.h" #include "multifd.h" +#include "qemu/yank.h" #ifdef CONFIG_VFIO #include "hw/vfio/vfio-common.h" @@ -254,6 +255,8 @@ void migration_incoming_state_destroy(void) qapi_free_SocketAddressList(mis->socket_address_list); mis->socket_address_list = NULL; } + +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_generate_event(int new_state) @@ -418,6 +421,11 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} + qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || @@ -432,6 +440,7 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", )) { fd_start_incoming_migration(p, errp); } else { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -1737,6 +1746,7 @@ static void migrate_fd_cleanup(MigrationState *s) } notifier_list_notify(_state_notifiers, s); block_cleanup_parameters(s); +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_fd_cleanup_schedule(MigrationState *s) @@ -2011,6 +2021,7 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ +yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } @@ -2148,6 +2159,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (!(has_resume && resume)) { +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} +} + if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || strstart(uri, "vsock:", NULL)) { @@ -2161,6 +2179,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", )) { fd_start_outgoing_migration(s, p, _err); } else { +if (!(has_resume && resume)) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); +} error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid migration protocol"); migrate_set_state(>state, MIGRATION_STATUS_SETUP, @@ -2170,6 +2191,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } if (loca
[PATCH v12 1/7] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Markus Armbruster --- MAINTAINERS | 7 ++ include/qemu/yank.h | 95 +++ qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 +++ util/meson.build | 1 + util/yank.c | 216 ++ 7 files changed, 440 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c diff --git a/MAINTAINERS b/MAINTAINERS index d48a4e8a8b..5d7e3c0e4b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2705,6 +2705,13 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h +F: qapi/yank.json + COLO Framework M: zhanghailiang S: Maintained diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..96f5b2626f --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,95 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +#include "qapi/qapi-types-yank.h" + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance: The instance. + * @errp: Error object. + */ +void yank_register_instance(const YankInstance *instance, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance: The instance. + */ +void yank_unregister_instance(const YankInstance *instance); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: The yank function. + * @opaque: Will be passed to the yank function. + */ +void yank_register_function(const YankInstance *instance, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: func that was passed to yank_register_function. + * @opaque: opaque that was passed to yank_register_function. + */ +void yank_unregister_function(const YankInstance *instance, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); + +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_BLOCK_NODE, \ +.u.block_node.node_name = (the_node_name) }) + +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_CHARDEV, \ +.u.chardev.id = (the_id) }) + +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_MIGRATION }) + +#endif diff --git a/qapi/meson.build b/qapi/meson.build index 0e98146f1f..ab68e7900e 100644 --- a/qapi/meson.build +++ b/qapi/meson.build @@ -47,6 +47,7 @@ qapi_all_modules = [ 'trace', 'transaction', 'ui', + 'yank', ] qapi_storage_daemon_modules = [ diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 0b444b76d2..3441c9a9ae 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -86,6 +86,7 @@ { 'include': 'machine.json' } { 'include': 'machine-target.json' } { 'include': 'replay.json' } +{ 'include': 'yank.json' } { 'include': 'misc.json' } { 'include': 'misc-target.json' } { 'include': 'audio.json' } diff --git a/qapi/yank.json b/qapi/yank.json new file mode 100644 index 00..167a775594 --- /dev/null +++ b/qapi/yank.json @@ -0,0 +1,119 @@ +# -*- Mode: Python -*- +# vim:
[PATCH v12 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
Hello Everyone, So here is v12. @Marc-André Lureau, We still need an ACK for the chardev patch. Changes: v12: -rebase onto master -minor change to migration (removal of "defer" branch in qemu_start_incoming_migration) -add Reviewed-by tags v11: -squashed MAINTAINERS update into patch 1 -move qmp doc of yank before misc -add title for qmp docs -change "Since:" to 6.0 -add Reviewed-by tags v10: -moved from qapi/misc.json to qapi/yank.json -rename 'blockdev' -> 'block-node' -document difference betwen migration yank instance and migrate_cancel -better document return values of yank command -better document yank_lock -minor style and spelling fixes v9: -rebase onto master -implemented new qmp api as proposed by Markus v8: -add Reviewed-by and Acked-by tags -rebase onto master -minor change to migration -convert to meson -change "Since:" to 5.2 -varios code style fixes (Markus Armbruster) -point to oob restrictions in comment to yank_register_function (Markus Armbruster) -improve qmp documentation (Markus Armbruster) -document oob suitability of qio_channel and io_shutdown (Markus Armbruster) v7: -yank_register_instance now returns error via Error **errp instead of aborting -dropped "chardev/char.c: Check for duplicate id before creating chardev" v6: -add Reviewed-by and Acked-by tags -rebase on master -lots of changes in nbd due to rebase -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) -fix a crash discovered by the newly added chardev test -fix the test itself v5: -move yank.c to util/ -move yank.h to include/qemu/ -add license to yank.h -use const char* -nbd: use atomic_store_release and atomic_load_aqcuire -io-channel: ensure thread-safety and document it -add myself as maintainer for yank v4: -fix build errors... v3: -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) -fix build errors -rewrite migration patch so it actually passes all tests v2: -don't touch io/ code anymore -always register yank functions -'yank' now takes a list of instances to yank -'query-yank' returns a list of yankable instances Overview: Hello Everyone, In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) to some other server and that server dies or hangs, qemu hangs too. These patches introduce the new 'yank' out-of-band qmp command to recover from these kinds of hangs. The different subsystems register callbacks which get executed with the yank command. For example the callback can shutdown() a socket. This is intended for the colo use-case, but it can be used for other things too of course. Regards, Lukas Straub Lukas Straub (7): Introduce yank feature block/nbd.c: Add yank feature chardev/char-socket.c: Add yank feature migration: Add yank feature io/channel-tls.c: make qio_channel_tls_shutdown thread-safe io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test MAINTAINERS | 7 ++ block/nbd.c | 154 ++-- chardev/char-socket.c | 35 ++ include/io/channel.h | 5 +- include/qemu/yank.h | 95 +++ io/channel-tls.c | 6 +- migration/channel.c | 13 ++ migration/migration.c | 24 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 ++ migration/savevm.c| 6 + qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 +++ tests/test-char.c | 1 + util/meson.build | 1 + util/yank.c | 216 ++ 17 files changed, 637 insertions(+), 64 deletions(-) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c -- 2.20.1 pgpVozt0Ip024.pgp Description: OpenPGP digital signature
[PATCH v12 2/7] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- block/nbd.c | 154 +++- 1 file changed, 93 insertions(+), 61 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 42536702b6..994d1e7b33 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -44,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -141,14 +144,13 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, - Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs, bool detach); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -166,12 +168,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -204,7 +206,7 @@ static void reconnect_delay_timer_cb(void *opaque) { BDRVNBDState *s = opaque; -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; while (qemu_co_enter_next(>free_sema, NULL)) { /* Resume all queued requests */ @@ -216,7 +218,7 @@ static void reconnect_delay_timer_cb(void *opaque) static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) { -if (s->state != NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) { return; } @@ -261,7 +263,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -287,7 +289,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) reconnect_delay_timer_del(s); -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; qemu_co_queue_restart_all(>free_sema); } @@ -338,13 +340,14 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = qatomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static void connect_bh(void *opaque) @@ -424,12 +427,12 @@ static void *connect_thread_func(void *opaque) return NULL; } -static QIOChannelSocket *coroutine_fn +static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Err
Re: [PATCH v11 2/7] block/nbd.c: Add yank feature
On Wed, 2 Dec 2020 15:18:48 +0300 Vladimir Sementsov-Ogievskiy wrote: > 15.11.2020 14:36, Lukas Straub wrote: > > Register a yank function which shuts down the socket and sets > > s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an > > error occured. > > > > Signed-off-by: Lukas Straub > > Acked-by: Stefan Hajnoczi > > Hi! Could I ask, what's the reason for qatomic_load_acquire access to > s->state? Is there same bug fixed? Or is it related somehow to new feature? > Hi, This is for the new feature, as the yank function runs in a separate thread. Regards, Lukas Straub -- pgpx90ndOfANX.pgp Description: OpenPGP digital signature
[PATCH v11 7/7] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
A connecting chardev object has an additional reference by the connecting thread, so if the chardev is still connecting by the end of the test, then the chardev object won't be freed. This in turn means that the yank instance won't be unregistered and when running the next test-case yank_register_instance will abort, because the yank instance is already/still registered. Signed-off-by: Lukas Straub Reviewed-by: Daniel P. Berrangé --- tests/test-char.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test-char.c b/tests/test-char.c index 9196e566e9..aedb5c9eda 100644 --- a/tests/test-char.c +++ b/tests/test-char.c @@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque) g_assert_nonnull(opts); chr1 = qemu_chr_new_from_opts(opts, NULL, _abort); g_assert_nonnull(chr1); +qemu_chr_wait_connected(chr1, _abort); chr2 = qemu_chr_new_from_opts(opts, NULL, _err); g_assert_null(chr2); -- 2.20.1 pgp3LYpnjFh8T.pgp Description: OpenPGP digital signature
[PATCH v11 4/7] migration: Add yank feature
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Acked-by: Dr. David Alan Gilbert --- migration/channel.c | 13 + migration/migration.c | 25 + migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 +++ migration/savevm.c| 6 ++ 5 files changed, 61 insertions(+) diff --git a/migration/channel.c b/migration/channel.c index 8a783baa0b..35fe234e9c 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -18,6 +18,8 @@ #include "trace.h" #include "qapi/error.h" #include "io/channel-tls.h" +#include "io/channel-socket.h" +#include "qemu/yank.h" /** * @migration_channel_process_incoming - Create new incoming migration channel @@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), @@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s, ioc, object_get_typename(OBJECT(ioc)), hostname, error); if (!error) { +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, + yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), diff --git a/migration/migration.c b/migration/migration.c index 87a9b59f83..a5add9d17d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -56,6 +56,7 @@ #include "net/announce.h" #include "qemu/queue.h" #include "multifd.h" +#include "qemu/yank.h" #ifdef CONFIG_VFIO #include "hw/vfio/vfio-common.h" @@ -252,6 +253,8 @@ void migration_incoming_state_destroy(void) qapi_free_SocketAddressList(mis->socket_address_list); mis->socket_address_list = NULL; } + +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_generate_event(int new_state) @@ -429,8 +432,14 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} + qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (!strcmp(uri, "defer")) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); deferred_incoming_migration(errp); } else if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || @@ -445,6 +454,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", )) { fd_start_incoming_migration(p, errp); } else { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -1750,6 +1760,7 @@ static void migrate_fd_cleanup(MigrationState *s) } notifier_list_notify(_state_notifiers, s); block_cleanup_parameters(s); +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_fd_cleanup_schedule(MigrationState *s) @@ -2024,6 +2035,7 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ +yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } @@ -2161,6 +2173,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (!(has_resume && resume)) { +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} +} + if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || strstart(uri, "vsock:", NULL)) { @@ -2174,6 +2193,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", )) { fd_start_outgoing_migration(s, p, _err); } else { +if (!(has_resume && resume)) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); +} error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid migration protocol&qu
[PATCH v11 3/7] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- chardev/char-socket.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 213a4c8dd0..7f2ee9a338 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" #include "qom/object.h" @@ -70,6 +71,7 @@ struct SocketChardev { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +bool registered_yank; SocketAddress *addr; bool is_listen; @@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -932,6 +940,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -946,6 +957,9 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -961,6 +975,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -976,6 +993,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1086,6 +1106,9 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->registered_yank) { +yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label)); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1101,6 +1124,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1134,6 +1160,9 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1376,6 +1405,12 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp); +if (*errp) { +return; +} +s->registered_yank = true; + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.20.1 pgppj_z8lj0Y7.pgp Description: OpenPGP digital signature
[PATCH v11 2/7] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- block/nbd.c | 154 +++- 1 file changed, 93 insertions(+), 61 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 42536702b6..994d1e7b33 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -44,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -141,14 +144,13 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, - Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs, bool detach); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -166,12 +168,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -204,7 +206,7 @@ static void reconnect_delay_timer_cb(void *opaque) { BDRVNBDState *s = opaque; -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; while (qemu_co_enter_next(>free_sema, NULL)) { /* Resume all queued requests */ @@ -216,7 +218,7 @@ static void reconnect_delay_timer_cb(void *opaque) static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) { -if (s->state != NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) { return; } @@ -261,7 +263,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -287,7 +289,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) reconnect_delay_timer_del(s); -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; qemu_co_queue_restart_all(>free_sema); } @@ -338,13 +340,14 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = qatomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static void connect_bh(void *opaque) @@ -424,12 +427,12 @@ static void *connect_thread_func(void *opaque) return NULL; } -static QIOChannelSocket *coroutine_fn +static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Error **errp) { +int ret;
[PATCH v11 6/7] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
Migration and yank code assume that qio_channel_shutdown is thread -safe and can be called from qmp oob handler. Document this after checking the code. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- include/io/channel.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/io/channel.h b/include/io/channel.h index 4d6fe45f63..ab9ea77959 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -92,7 +92,8 @@ struct QIOChannel { * provide additional optional features. * * Consult the corresponding public API docs for a description - * of the semantics of each callback + * of the semantics of each callback. io_shutdown in particular + * must be thread-safe, terminate quickly and must not block. */ struct QIOChannelClass { ObjectClass parent; @@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc, * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling * this method. * + * This function is thread-safe, terminates quickly and does not block. + * * Returns: 0 on success, -1 on error */ int qio_channel_shutdown(QIOChannel *ioc, -- 2.20.1 pgpgWjyPDuAY5.pgp Description: OpenPGP digital signature
[PATCH v11 5/7] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 388f019977..2ae1b92fc0 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (qatomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +qatomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.20.1 pgp9E3Ifl4t4T.pgp Description: OpenPGP digital signature
[PATCH v11 0/7] Introduce 'yank' oob qmp command to recover from hanging qemu
Hello Everyone, So here is v11. @Eric Blake and @Marc-André Lureau: We still need ACKs for NBD and chardev. Changes: v11: -squashed MAINTAINERS update into patch 1 -move qmp doc of yank before misc -add title for qmp docs -change "Since:" to 6.0 -add Reviewed-by tags v10: -moved from qapi/misc.json to qapi/yank.json -rename 'blockdev' -> 'block-node' -document difference betwen migration yank instance and migrate_cancel -better document return values of yank command -better document yank_lock -minor style and spelling fixes v9: -rebase onto master -implemented new qmp api as proposed by Markus v8: -add Reviewed-by and Acked-by tags -rebase onto master -minor change to migration -convert to meson -change "Since:" to 5.2 -varios code style fixes (Markus Armbruster) -point to oob restrictions in comment to yank_register_function (Markus Armbruster) -improve qmp documentation (Markus Armbruster) -document oob suitability of qio_channel and io_shutdown (Markus Armbruster) v7: -yank_register_instance now returns error via Error **errp instead of aborting -dropped "chardev/char.c: Check for duplicate id before creating chardev" v6: -add Reviewed-by and Acked-by tags -rebase on master -lots of changes in nbd due to rebase -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) -fix a crash discovered by the newly added chardev test -fix the test itself v5: -move yank.c to util/ -move yank.h to include/qemu/ -add license to yank.h -use const char* -nbd: use atomic_store_release and atomic_load_aqcuire -io-channel: ensure thread-safety and document it -add myself as maintainer for yank v4: -fix build errors... v3: -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) -fix build errors -rewrite migration patch so it actually passes all tests v2: -don't touch io/ code anymore -always register yank functions -'yank' now takes a list of instances to yank -'query-yank' returns a list of yankable instances Overview: Hello Everyone, In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) to some other server and that server dies or hangs, qemu hangs too. These patches introduce the new 'yank' out-of-band qmp command to recover from these kinds of hangs. The different subsystems register callbacks which get executed with the yank command. For example the callback can shutdown() a socket. This is intended for the colo use-case, but it can be used for other things too of course. Regards, Lukas Straub Lukas Straub (7): Introduce yank feature block/nbd.c: Add yank feature chardev/char-socket.c: Add yank feature migration: Add yank feature io/channel-tls.c: make qio_channel_tls_shutdown thread-safe io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test MAINTAINERS | 7 ++ block/nbd.c | 154 ++-- chardev/char-socket.c | 35 ++ include/io/channel.h | 5 +- include/qemu/yank.h | 95 +++ io/channel-tls.c | 6 +- migration/channel.c | 13 ++ migration/migration.c | 25 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 ++ migration/savevm.c| 6 + qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 +++ tests/test-char.c | 1 + util/meson.build | 1 + util/yank.c | 216 ++ 17 files changed, 638 insertions(+), 64 deletions(-) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c -- 2.20.1 pgpKXlhhXx8qQ.pgp Description: OpenPGP digital signature
[PATCH v11 1/7] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Markus Armbruster --- MAINTAINERS | 7 ++ include/qemu/yank.h | 95 +++ qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 119 +++ util/meson.build | 1 + util/yank.c | 216 ++ 7 files changed, 440 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c diff --git a/MAINTAINERS b/MAINTAINERS index 2e018a0c1d..46ff468b13 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2688,6 +2688,13 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h +F: qapi/yank.json + COLO Framework M: zhanghailiang S: Maintained diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..96f5b2626f --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,95 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +#include "qapi/qapi-types-yank.h" + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance: The instance. + * @errp: Error object. + */ +void yank_register_instance(const YankInstance *instance, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance: The instance. + */ +void yank_unregister_instance(const YankInstance *instance); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: The yank function. + * @opaque: Will be passed to the yank function. + */ +void yank_register_function(const YankInstance *instance, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: func that was passed to yank_register_function. + * @opaque: opaque that was passed to yank_register_function. + */ +void yank_unregister_function(const YankInstance *instance, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); + +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_BLOCK_NODE, \ +.u.block_node.node_name = (the_node_name) }) + +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_CHARDEV, \ +.u.chardev.id = (the_id) }) + +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_MIGRATION }) + +#endif diff --git a/qapi/meson.build b/qapi/meson.build index 0e98146f1f..ab68e7900e 100644 --- a/qapi/meson.build +++ b/qapi/meson.build @@ -47,6 +47,7 @@ qapi_all_modules = [ 'trace', 'transaction', 'ui', + 'yank', ] qapi_storage_daemon_modules = [ diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 0b444b76d2..3441c9a9ae 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -86,6 +86,7 @@ { 'include': 'machine.json' } { 'include': 'machine-target.json' } { 'include': 'replay.json' } +{ 'include': 'yank.json' } { 'include': 'misc.json' } { 'include': 'misc-target.json' } { 'include': 'audio.json' } diff --git a/qapi/yank.json b/qapi/yank.json new file mode 100644 index 00..167a775594 --- /dev/null +++ b/qapi/yank.json @@ -0,0 +1,119 @@ +# -*- Mode: Python -*- +# vim:
Re: [PATCH v10 7/8] MAINTAINERS: Add myself as maintainer for yank feature
On Mon, 02 Nov 2020 07:33:54 +0100 Markus Armbruster wrote: > Lukas Straub writes: > > > I'll maintain this for now as the colo usecase is the first user > > of this functionality. > > > > Signed-off-by: Lukas Straub > > Acked-by: Stefan Hajnoczi > > --- > > MAINTAINERS | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 8c744a9bdf..81288fd219 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -2676,6 +2676,13 @@ F: util/uuid.c > > F: include/qemu/uuid.h > > F: tests/test-uuid.c > > > > +Yank feature > > +M: Lukas Straub > > +S: Odd fixes > > +F: util/yank.c > > +F: include/qemu/yank.h > > +F: qapi/yank.json > > + > > COLO Framework > > M: zhanghailiang > > S: Maintained > > I'd squash this into PATCH 1 to mollify checkpatch.pl. > > Regardless, > Reviewed-by: Markus Armbruster > Changed for the next version. -- pgpwbyQZpUaYM.pgp Description: OpenPGP digital signature
Re: [PATCH v10 1/8] Introduce yank feature
On Mon, 02 Nov 2020 07:32:55 +0100 Markus Armbruster wrote: > Lukas Straub writes: > > > The yank feature allows to recover from hanging qemu by "yanking" > > at various parts. Other qemu systems can register themselves and > > multiple yank functions. Then all yank functions for selected > > instances can be called by the 'yank' out-of-band qmp command. > > Available instances can be queried by a 'query-yank' oob command. > > > > Signed-off-by: Lukas Straub > > Acked-by: Stefan Hajnoczi > [...] > > qapi_storage_daemon_modules = [ > > diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json > > index 0b444b76d2..79c1705ed7 100644 > > --- a/qapi/qapi-schema.json > > +++ b/qapi/qapi-schema.json > > @@ -91,3 +91,4 @@ > > { 'include': 'audio.json' } > > { 'include': 'acpi.json' } > > { 'include': 'pci.json' } > > +{ 'include': 'yank.json' } > > This adds the documentation at the very end of the reference manual. Is > this where you want it to go? Check generated > docs/interop/qemu-qmp-ref.html. I've moved it above misc for the next version. > > diff --git a/qapi/yank.json b/qapi/yank.json > > new file mode 100644 > > index 00..1964a2202e > > --- /dev/null > > +++ b/qapi/yank.json > > @@ -0,0 +1,115 @@ > > +# -*- Mode: Python -*- > > +# vim: filetype=python > > +# > > + > > Please add a suitable heading here. Headings look like this: > >## ># Text of heading goes here >## Changed for the next version. > Without it, the yank stuff gets squashed into the previous section > (happens to be PCI). > > If you want to add an introduction or overview, it goes right below the > heading. I'm not asking you to do that, I'm only telling you what's > possible. > > [...] > > Solid work, pleasant to review, thanks! > > Reviewed-by: Markus Armbruster > Thanks! -- pgpn2W4tkMdVg.pgp Description: OpenPGP digital signature
[PATCH v10 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 7ec8ceff2f..10d0bf59aa 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (qatomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +qatomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.20.1 pgprghgOqQAAI.pgp Description: OpenPGP digital signature
[PATCH v10 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
A connecting chardev object has an additional reference by the connecting thread, so if the chardev is still connecting by the end of the test, then the chardev object won't be freed. This in turn means that the yank instance won't be unregistered and when running the next test-case yank_register_instance will abort, because the yank instance is already/still registered. Signed-off-by: Lukas Straub Reviewed-by: Daniel P. Berrangé --- tests/test-char.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test-char.c b/tests/test-char.c index 9196e566e9..aedb5c9eda 100644 --- a/tests/test-char.c +++ b/tests/test-char.c @@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque) g_assert_nonnull(opts); chr1 = qemu_chr_new_from_opts(opts, NULL, _abort); g_assert_nonnull(chr1); +qemu_chr_wait_connected(chr1, _abort); chr2 = qemu_chr_new_from_opts(opts, NULL, _err); g_assert_null(chr2); -- 2.20.1 pgpJtO1z6hH_l.pgp Description: OpenPGP digital signature
[PATCH v10 4/8] migration: Add yank feature
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Acked-by: Dr. David Alan Gilbert --- migration/channel.c | 13 + migration/migration.c | 25 + migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 +++ migration/savevm.c| 6 ++ 5 files changed, 61 insertions(+) diff --git a/migration/channel.c b/migration/channel.c index 8a783baa0b..35fe234e9c 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -18,6 +18,8 @@ #include "trace.h" #include "qapi/error.h" #include "io/channel-tls.h" +#include "io/channel-socket.h" +#include "qemu/yank.h" /** * @migration_channel_process_incoming - Create new incoming migration channel @@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), @@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s, ioc, object_get_typename(OBJECT(ioc)), hostname, error); if (!error) { +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, + yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), diff --git a/migration/migration.c b/migration/migration.c index 9bb4fee5ac..0b0442df37 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -56,6 +56,7 @@ #include "net/announce.h" #include "qemu/queue.h" #include "multifd.h" +#include "qemu/yank.h" #define MAX_THROTTLE (128 << 20) /* Migration transfer speed throttling */ @@ -248,6 +249,8 @@ void migration_incoming_state_destroy(void) qapi_free_SocketAddressList(mis->socket_address_list); mis->socket_address_list = NULL; } + +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_generate_event(int new_state) @@ -425,8 +428,14 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} + qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (!strcmp(uri, "defer")) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); deferred_incoming_migration(errp); } else if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || @@ -441,6 +450,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", )) { fd_start_incoming_migration(p, errp); } else { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -1733,6 +1743,7 @@ static void migrate_fd_cleanup(MigrationState *s) } notifier_list_notify(_state_notifiers, s); block_cleanup_parameters(s); +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_fd_cleanup_schedule(MigrationState *s) @@ -2007,6 +2018,7 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ +yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } @@ -2144,6 +2156,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (!(has_resume && resume)) { +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} +} + if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || strstart(uri, "vsock:", NULL)) { @@ -2157,6 +2176,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", )) { fd_start_outgoing_migration(s, p, _err); } else { +if (!(has_resume && resume)) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); +} error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid m
[PATCH v10 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
Migration and yank code assume that qio_channel_shutdown is thread -safe and can be called from qmp oob handler. Document this after checking the code. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- include/io/channel.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/io/channel.h b/include/io/channel.h index 3c04f0edda..e0b9fc615d 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -92,7 +92,8 @@ struct QIOChannel { * provide additional optional features. * * Consult the corresponding public API docs for a description - * of the semantics of each callback + * of the semantics of each callback. io_shutdown in particular + * must be thread-safe, terminate quickly and must not block. */ struct QIOChannelClass { ObjectClass parent; @@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc, * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling * this method. * + * This function is thread-safe, terminates quickly and does not block. + * * Returns: 0 on success, -1 on error */ int qio_channel_shutdown(QIOChannel *ioc, -- 2.20.1 pgpSPmLHd3Cfu.pgp Description: OpenPGP digital signature
[PATCH v10 2/8] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- block/nbd.c | 154 +++- 1 file changed, 93 insertions(+), 61 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 4548046cd7..d66c84ee40 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -44,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -140,14 +143,13 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, - Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs, bool detach); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -165,12 +167,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -203,7 +205,7 @@ static void reconnect_delay_timer_cb(void *opaque) { BDRVNBDState *s = opaque; -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; while (qemu_co_enter_next(>free_sema, NULL)) { /* Resume all queued requests */ @@ -215,7 +217,7 @@ static void reconnect_delay_timer_cb(void *opaque) static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) { -if (s->state != NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) { return; } @@ -260,7 +262,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -286,7 +288,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) reconnect_delay_timer_del(s); -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; qemu_co_queue_restart_all(>free_sema); } @@ -337,13 +339,14 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = qatomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static void connect_bh(void *opaque) @@ -423,12 +426,12 @@ static void *connect_thread_func(void *opaque) return NULL; } -static QIOChannelSocket *coroutine_fn +static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Error **errp) { +int ret;
[PATCH v10 7/8] MAINTAINERS: Add myself as maintainer for yank feature
I'll maintain this for now as the colo usecase is the first user of this functionality. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 8c744a9bdf..81288fd219 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2676,6 +2676,13 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h +F: qapi/yank.json + COLO Framework M: zhanghailiang S: Maintained -- 2.20.1 pgpjxUauG10GX.pgp Description: OpenPGP digital signature
[PATCH v10 3/8] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- chardev/char-socket.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 95e45812d5..5947cbe8bb 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" #include "qom/object.h" @@ -70,6 +71,7 @@ struct SocketChardev { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +bool registered_yank; SocketAddress *addr; bool is_listen; @@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -918,6 +926,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -932,6 +943,9 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -947,6 +961,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -962,6 +979,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1072,6 +1092,9 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->registered_yank) { +yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label)); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1087,6 +1110,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1120,6 +1146,9 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1362,6 +1391,12 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp); +if (*errp) { +return; +} +s->registered_yank = true; + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.20.1 pgpx12P9M6Wj4.pgp Description: OpenPGP digital signature
[PATCH v10 1/8] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- include/qemu/yank.h | 95 +++ qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 115 ++ util/meson.build | 1 + util/yank.c | 216 ++ 6 files changed, 429 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..96f5b2626f --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,95 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +#include "qapi/qapi-types-yank.h" + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance: The instance. + * @errp: Error object. + */ +void yank_register_instance(const YankInstance *instance, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance: The instance. + */ +void yank_unregister_instance(const YankInstance *instance); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: The yank function. + * @opaque: Will be passed to the yank function. + */ +void yank_register_function(const YankInstance *instance, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: func that was passed to yank_register_function. + * @opaque: opaque that was passed to yank_register_function. + */ +void yank_unregister_function(const YankInstance *instance, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); + +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_BLOCK_NODE, \ +.u.block_node.node_name = (the_node_name) }) + +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_CHARDEV, \ +.u.chardev.id = (the_id) }) + +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_MIGRATION }) + +#endif diff --git a/qapi/meson.build b/qapi/meson.build index 0e98146f1f..ab68e7900e 100644 --- a/qapi/meson.build +++ b/qapi/meson.build @@ -47,6 +47,7 @@ qapi_all_modules = [ 'trace', 'transaction', 'ui', + 'yank', ] qapi_storage_daemon_modules = [ diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 0b444b76d2..79c1705ed7 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -91,3 +91,4 @@ { 'include': 'audio.json' } { 'include': 'acpi.json' } { 'include': 'pci.json' } +{ 'include': 'yank.json' } diff --git a/qapi/yank.json b/qapi/yank.json new file mode 100644 index 00..1964a2202e --- /dev/null +++ b/qapi/yank.json @@ -0,0 +1,115 @@ +# -*- Mode: Python -*- +# vim: filetype=python +# + +## +# @YankInstanceType: +# +# An enumeration of yank instance types. See @YankInstance for more +# information. +# +# Since: 5.2 +## +{ 'enum': 'YankInstanceType', + 'data': [ 'block-node', 'chardev', 'migration' ] } + +## +# @YankInstanceBlockNode: +# +# Specifies which block graph node to yank. See @YankInstance for more +# information. +# +# @node-name: the name of the block graph node +# +# Since: 5.2 +## +{ 'struct': 'YankInstanceBlockNode', + 'data': { 'node-name': 'str' } } + +## +# @
[PATCH v10 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
Hello Everyone, So here is v10. We still need ACKs from NBD and chardev maintainers. Changes: v10: -moved from qapi/misc.json to qapi/yank.json -rename 'blockdev' -> 'block-node' -document difference betwen migration yank instance and migrate_cancel -better document return values of yank command -beter document yank_lock -minor style and spelling fixes v9: -rebase onto master -implemented new qmp api as proposed by Markus v8: -add Reviewed-by and Acked-by tags -rebase onto master -minor change to migration -convert to meson -change "Since:" to 5.2 -varios code style fixes (Markus Armbruster) -point to oob restrictions in comment to yank_register_function (Markus Armbruster) -improve qmp documentation (Markus Armbruster) -document oob suitability of qio_channel and io_shutdown (Markus Armbruster) v7: -yank_register_instance now returns error via Error **errp instead of aborting -dropped "chardev/char.c: Check for duplicate id before creating chardev" v6: -add Reviewed-by and Acked-by tags -rebase on master -lots of changes in nbd due to rebase -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) -fix a crash discovered by the newly added chardev test -fix the test itself v5: -move yank.c to util/ -move yank.h to include/qemu/ -add license to yank.h -use const char* -nbd: use atomic_store_release and atomic_load_aqcuire -io-channel: ensure thread-safety and document it -add myself as maintainer for yank v4: -fix build errors... v3: -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) -fix build errors -rewrite migration patch so it actually passes all tests v2: -don't touch io/ code anymore -always register yank functions -'yank' now takes a list of instances to yank -'query-yank' returns a list of yankable instances Overview: Hello Everyone, In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) to some other server and that server dies or hangs, qemu hangs too. These patches introduce the new 'yank' out-of-band qmp command to recover from these kinds of hangs. The different subsystems register callbacks which get executed with the yank command. For example the callback can shutdown() a socket. This is intended for the colo use-case, but it can be used for other things too of course. Regards, Lukas Straub Lukas Straub (8): Introduce yank feature block/nbd.c: Add yank feature chardev/char-socket.c: Add yank feature migration: Add yank feature io/channel-tls.c: make qio_channel_tls_shutdown thread-safe io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown MAINTAINERS: Add myself as maintainer for yank feature tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test MAINTAINERS | 7 ++ block/nbd.c | 154 ++-- chardev/char-socket.c | 35 ++ include/io/channel.h | 5 +- include/qemu/yank.h | 95 +++ io/channel-tls.c | 6 +- migration/channel.c | 13 ++ migration/migration.c | 25 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 ++ migration/savevm.c| 6 + qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + qapi/yank.json| 115 ++ tests/test-char.c | 1 + util/meson.build | 1 + util/yank.c | 216 ++ 17 files changed, 634 insertions(+), 64 deletions(-) create mode 100644 include/qemu/yank.h create mode 100644 qapi/yank.json create mode 100644 util/yank.c -- 2.20.1 pgpxQ7haaweP2.pgp Description: OpenPGP digital signature
Re: [PATCH v9 1/8] Introduce yank feature
On Fri, 30 Oct 2020 15:02:09 +0100 Markus Armbruster wrote: > Lukas Straub writes: > > > On Thu, 29 Oct 2020 17:36:14 +0100 > > Markus Armbruster wrote: > > > >> Nothing major, looks almost ready to me. > >> > >> Lukas Straub writes: > >> > >> > The yank feature allows to recover from hanging qemu by "yanking" > >> > at various parts. Other qemu systems can register themselves and > >> > multiple yank functions. Then all yank functions for selected > >> > instances can be called by the 'yank' out-of-band qmp command. > >> > Available instances can be queried by a 'query-yank' oob command. > >> > > >> > Signed-off-by: Lukas Straub > >> > Acked-by: Stefan Hajnoczi > >> > --- > >> > include/qemu/yank.h | 95 > >> > qapi/misc.json | 106 ++ > >> > util/meson.build| 1 + > >> > util/yank.c | 213 > >> > > >> > >> checkpatch.pl warns: > >> > >> WARNING: added, moved or deleted file(s), does MAINTAINERS need > >> updating? > >> > >> Can we find a maintainer for the two new files? > > > > Yes, I'm maintaining this for now, see patch 7. > > Thanks! Would it make sense to add the yank stuff to a new QAPI module > yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover > it? Yes, makes sense. Changed for the next version. > [...] > >> > diff --git a/qapi/misc.json b/qapi/misc.json > >> > index 40df513856..3b7de02a4d 100644 > >> > --- a/qapi/misc.json > >> > +++ b/qapi/misc.json > [...] > >> > +## > >> > +# @YankInstance: > >> > +# > >> > +# A yank instance can be yanked with the "yank" qmp command to recover > >> > from a > >> > +# hanging qemu. > >> > >> QEMU > >> > >> > +# > >> > +# Currently implemented yank instances: > >> > +# -nbd block device: > >> > +# Yanking it will shutdown the connection to the nbd server without > >> > +# attempting to reconnect. > >> > +# -socket chardev: > >> > +# Yanking it will shutdown the connected socket. > >> > +# -migration: > >> > +# Yanking it will shutdown all migration connections. > >> > >> To my surprise, this is recognized as bullet list markup. But please > >> put a space between the bullet and the text anyway. > >> > >> Also: "shutdown" is a noun, the verb is spelled "shut down". > > > > Both changed for the next version. > > > >> In my review of v8, I asked how yanking migration is related to command > >> migrate_cancel. Daniel explained: > >> > >> migrate_cancel will do a shutdown() on the primary migration socket > >> only. > >> In addition it will toggle the migration state. > >> > >> Yanking will do a shutdown on all migration sockets (important for > >> multifd), but won't touch migration state or any other aspect of QEMU > >> code. > >> > >> Overall yanking has less potential for things to go wrong than the > >> migrate_cancel method, as it doesn't try to do any kind of cleanup > >> or migration. > >> > >> Would it make sense to work this into the documentation? > > > > How about this? > > > > - migration: > > Yanking it will shut down all migration connections. Unlike > > @migrate_cancel, it will not notify the migration process, > > so migration will go into @failed state, instead of @cancelled > > state. > > Works for me. Advice on when to use it rather than migrate_cancel would > be nice, though. Ok, Changed for the next version. > >> > +# > >> > +# Since: 5.2 > >> > +## > >> > +{ 'union': 'YankInstance', > >> > + 'base': { 'type': 'YankInstanceType' }, > >> > + 'discriminator': 'type', > >> > + 'data': { > >> > + 'blockdev': 'YankInstanceBlockdev', > >> > + 'chardev': 'YankInstanceChardev' } } > >> > + > >> > +## > >> > +# @yank: > >> > +# > >> > +# Recover from hanging qemu by yanking the specified instances. See > >> > >> QEMU > >>
Re: [PATCH v9 1/8] Introduce yank feature
On Thu, 29 Oct 2020 17:36:14 +0100 Markus Armbruster wrote: > Nothing major, looks almost ready to me. > > Lukas Straub writes: > > > The yank feature allows to recover from hanging qemu by "yanking" > > at various parts. Other qemu systems can register themselves and > > multiple yank functions. Then all yank functions for selected > > instances can be called by the 'yank' out-of-band qmp command. > > Available instances can be queried by a 'query-yank' oob command. > > > > Signed-off-by: Lukas Straub > > Acked-by: Stefan Hajnoczi > > --- > > include/qemu/yank.h | 95 > > qapi/misc.json | 106 ++ > > util/meson.build| 1 + > > util/yank.c | 213 > > checkpatch.pl warns: > > WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? > > Can we find a maintainer for the two new files? Yes, I'm maintaining this for now, see patch 7. > > 4 files changed, 415 insertions(+) > > create mode 100644 include/qemu/yank.h > > create mode 100644 util/yank.c > > > > diff --git a/include/qemu/yank.h b/include/qemu/yank.h > > new file mode 100644 > > index 000000..89755e62af > > --- /dev/null > > +++ b/include/qemu/yank.h > > @@ -0,0 +1,95 @@ > > +/* > > + * QEMU yank feature > > + * > > + * Copyright (c) Lukas Straub > > + * > > + * This work is licensed under the terms of the GNU GPL, version 2 or > > later. > > + * See the COPYING file in the top-level directory. > > + */ > > + > > +#ifndef YANK_H > > +#define YANK_H > > + > > +#include "qapi/qapi-types-misc.h" > > + > > +typedef void (YankFn)(void *opaque); > > + > > +/** > > + * yank_register_instance: Register a new instance. > > + * > > + * This registers a new instance for yanking. Must be called before any > > yank > > + * function is registered for this instance. > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + * @errp: Error object. > > + */ > > +void yank_register_instance(const YankInstance *instance, Error **errp); > > + > > +/** > > + * yank_unregister_instance: Unregister a instance. > > + * > > + * This unregisters a instance. Must be called only after every yank > > function > > + * of the instance has been unregistered. > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + */ > > +void yank_unregister_instance(const YankInstance *instance); > > + > > +/** > > + * yank_register_function: Register a yank function > > + * > > + * This registers a yank function. All limitations of qmp oob commands > > apply > > + * to the yank function as well. See docs/devel/qapi-code-gen.txt under > > + * "An OOB-capable command handler must satisfy the following conditions". > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + * @func: The yank function. > > + * @opaque: Will be passed to the yank function. > > + */ > > +void yank_register_function(const YankInstance *instance, > > +YankFn *func, > > +void *opaque); > > + > > +/** > > + * yank_unregister_function: Unregister a yank function > > + * > > + * This unregisters a yank function. > > + * > > + * This function is thread-safe. > > + * > > + * @instance: The instance. > > + * @func: func that was passed to yank_register_function. > > + * @opaque: opaque that was passed to yank_register_function. > > + */ > > +void yank_unregister_function(const YankInstance *instance, > > + YankFn *func, > > + void *opaque); > > + > > +/** > > + * yank_generic_iochannel: Generic yank function for iochannel > > + * > > + * This is a generic yank function which will call qio_channel_shutdown on > > the > > + * provided QIOChannel. > > + * > > + * @opaque: QIOChannel to shutdown > > + */ > > +void yank_generic_iochannel(void *opaque); > > + > > +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ > > +.type = YANK_INSTANCE_TYPE_BLOCKDEV, \ > > +.u.blockdev.node_name = (the_node_name) }) > > + > > +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ > &g
[PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
A connecting chardev object has an additional reference by the connecting thread, so if the chardev is still connecting by the end of the test, then the chardev object won't be freed. This in turn means that the yank instance won't be unregistered and when running the next test-case yank_register_instance will abort, because the yank instance is already/still registered. Signed-off-by: Lukas Straub Reviewed-by: Daniel P. Berrangé --- tests/test-char.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test-char.c b/tests/test-char.c index 9196e566e9..aedb5c9eda 100644 --- a/tests/test-char.c +++ b/tests/test-char.c @@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque) g_assert_nonnull(opts); chr1 = qemu_chr_new_from_opts(opts, NULL, _abort); g_assert_nonnull(chr1); +qemu_chr_wait_connected(chr1, _abort); chr2 = qemu_chr_new_from_opts(opts, NULL, _err); g_assert_null(chr2); -- 2.20.1 pgpUCTbxmqVgD.pgp Description: OpenPGP digital signature
[PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature
I'll maintain this for now as the colo usecase is the first user of this functionality. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ef6f5c7399..5921e565df 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2669,6 +2669,12 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h + COLO Framework M: zhanghailiang S: Maintained -- 2.20.1 pgpAZlEQHp9DS.pgp Description: OpenPGP digital signature
[PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
Migration and yank code assume that qio_channel_shutdown is thread -safe and can be called from qmp oob handler. Document this after checking the code. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- include/io/channel.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/io/channel.h b/include/io/channel.h index 3c04f0edda..e0b9fc615d 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -92,7 +92,8 @@ struct QIOChannel { * provide additional optional features. * * Consult the corresponding public API docs for a description - * of the semantics of each callback + * of the semantics of each callback. io_shutdown in particular + * must be thread-safe, terminate quickly and must not block. */ struct QIOChannelClass { ObjectClass parent; @@ -510,6 +511,8 @@ int qio_channel_close(QIOChannel *ioc, * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling * this method. * + * This function is thread-safe, terminates quickly and does not block. + * * Returns: 0 on success, -1 on error */ int qio_channel_shutdown(QIOChannel *ioc, -- 2.20.1 pgpGZ5stoTo2y.pgp Description: OpenPGP digital signature
[PATCH v9 1/8] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- include/qemu/yank.h | 95 qapi/misc.json | 106 ++ util/meson.build| 1 + util/yank.c | 213 4 files changed, 415 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 util/yank.c diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..89755e62af --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,95 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +#include "qapi/qapi-types-misc.h" + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance: The instance. + * @errp: Error object. + */ +void yank_register_instance(const YankInstance *instance, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance: The instance. + */ +void yank_unregister_instance(const YankInstance *instance); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: The yank function. + * @opaque: Will be passed to the yank function. + */ +void yank_register_function(const YankInstance *instance, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance: The instance. + * @func: func that was passed to yank_register_function. + * @opaque: opaque that was passed to yank_register_function. + */ +void yank_unregister_function(const YankInstance *instance, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); + +#define BLOCKDEV_YANK_INSTANCE(the_node_name) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_BLOCKDEV, \ +.u.blockdev.node_name = (the_node_name) }) + +#define CHARDEV_YANK_INSTANCE(the_id) (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_CHARDEV, \ +.u.chardev.id = (the_id) }) + +#define MIGRATION_YANK_INSTANCE (&(YankInstance) { \ +.type = YANK_INSTANCE_TYPE_MIGRATION }) + +#endif diff --git a/qapi/misc.json b/qapi/misc.json index 40df513856..3b7de02a4d 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -568,3 +568,109 @@ 'data': { '*option': 'str' }, 'returns': ['CommandLineOptionInfo'], 'allow-preconfig': true } + +## +# @YankInstanceType: +# +# An enumeration of yank instance types. See "YankInstance" for more +# information. +# +# Since: 5.2 +## +{ 'enum': 'YankInstanceType', + 'data': [ 'blockdev', 'chardev', 'migration' ] } + +## +# @YankInstanceBlockdev: +# +# Specifies which blockdev to yank. See "YankInstance" for more information. +# +# @node-name: the blockdev's node-name +# +# Since: 5.2 +## +{ 'struct': 'YankInstanceBlockdev', + 'data': { 'node-name': 'str' } } + +## +# @YankInstanceChardev: +# +# Specifies which chardev to yank. See "YankInstance" for more information. +# +# @id: the chardev's ID +# +# Since: 5.2 +## +{ 'struct': 'YankInstanceChardev', + 'data': { 'id': 'str' } } + +## +# @YankInstance: +# +# A yank instance can be yanked with the "yank" qmp command to recover from a +# hanging qemu. +# +# Currently implemented yank instances: +# -nbd block device: +# Yanking it will shutdown the connection to the nbd server without +# attempting to reconnect. +# -socket chardev: +# Yanking it will shutdown the connected socket. +# -mi
[PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 7ec8ceff2f..10d0bf59aa 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (qatomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +qatomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.20.1 pgplPla7hd55V.pgp Description: OpenPGP digital signature
[PATCH v9 2/8] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- block/nbd.c | 154 +++- 1 file changed, 93 insertions(+), 61 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 4548046cd7..d66c84ee40 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -44,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -140,14 +143,13 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, - Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs, bool detach); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -165,12 +167,12 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -203,7 +205,7 @@ static void reconnect_delay_timer_cb(void *opaque) { BDRVNBDState *s = opaque; -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; while (qemu_co_enter_next(>free_sema, NULL)) { /* Resume all queued requests */ @@ -215,7 +217,7 @@ static void reconnect_delay_timer_cb(void *opaque) static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) { -if (s->state != NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) { return; } @@ -260,7 +262,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -286,7 +288,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) reconnect_delay_timer_del(s); -if (s->state == NBD_CLIENT_CONNECTING_WAIT) { +if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) { s->state = NBD_CLIENT_CONNECTING_NOWAIT; qemu_co_queue_restart_all(>free_sema); } @@ -337,13 +339,14 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = qatomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static void connect_bh(void *opaque) @@ -423,12 +426,12 @@ static void *connect_thread_func(void *opaque) return NULL; } -static QIOChannelSocket *coroutine_fn +static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Error **errp) { +int ret;
[PATCH v9 3/8] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- chardev/char-socket.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 95e45812d5..5947cbe8bb 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" #include "qom/object.h" @@ -70,6 +71,7 @@ struct SocketChardev { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +bool registered_yank; SocketAddress *addr; bool is_listen; @@ -415,6 +417,12 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -918,6 +926,9 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -932,6 +943,9 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -947,6 +961,9 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -962,6 +979,9 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1072,6 +1092,9 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->registered_yank) { +yank_unregister_instance(CHARDEV_YANK_INSTANCE(chr->label)); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1087,6 +1110,9 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1120,6 +1146,9 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(CHARDEV_YANK_INSTANCE(chr->label), + yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1362,6 +1391,12 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +yank_register_instance(CHARDEV_YANK_INSTANCE(chr->label), errp); +if (*errp) { +return; +} +s->registered_yank = true; + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.20.1 pgphWJNBt8ICg.pgp Description: OpenPGP digital signature
[PATCH v9 4/8] migration: Add yank feature
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Acked-by: Dr. David Alan Gilbert --- migration/channel.c | 13 + migration/migration.c | 25 + migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 +++ migration/savevm.c| 6 ++ 5 files changed, 61 insertions(+) diff --git a/migration/channel.c b/migration/channel.c index 8a783baa0b..35fe234e9c 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -18,6 +18,8 @@ #include "trace.h" #include "qapi/error.h" #include "io/channel-tls.h" +#include "io/channel-socket.h" +#include "qemu/yank.h" /** * @migration_channel_process_incoming - Create new incoming migration channel @@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), @@ -67,6 +74,12 @@ void migration_channel_connect(MigrationState *s, ioc, object_get_typename(OBJECT(ioc)), hostname, error); if (!error) { +if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) { +yank_register_function(MIGRATION_YANK_INSTANCE, + yank_generic_iochannel, + QIO_CHANNEL(ioc)); +} + if (s->parameters.tls_creds && *s->parameters.tls_creds && !object_dynamic_cast(OBJECT(ioc), diff --git a/migration/migration.c b/migration/migration.c index 0575ecb379..e2c1123a90 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -56,6 +56,7 @@ #include "net/announce.h" #include "qemu/queue.h" #include "multifd.h" +#include "qemu/yank.h" #define MAX_THROTTLE (128 << 20) /* Migration transfer speed throttling */ @@ -244,6 +245,8 @@ void migration_incoming_state_destroy(void) qapi_free_SocketAddressList(mis->socket_address_list); mis->socket_address_list = NULL; } + +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_generate_event(int new_state) @@ -390,8 +393,14 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} + qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (!strcmp(uri, "defer")) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); deferred_incoming_migration(errp); } else if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || @@ -406,6 +415,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", )) { fd_start_incoming_migration(p, errp); } else { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -1698,6 +1708,7 @@ static void migrate_fd_cleanup(MigrationState *s) } notifier_list_notify(_state_notifiers, s); block_cleanup_parameters(s); +yank_unregister_instance(MIGRATION_YANK_INSTANCE); } static void migrate_fd_cleanup_schedule(MigrationState *s) @@ -1972,6 +1983,7 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ +yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } @@ -2109,6 +2121,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (!(has_resume && resume)) { +yank_register_instance(MIGRATION_YANK_INSTANCE, errp); +if (*errp) { +return; +} +} + if (strstart(uri, "tcp:", ) || strstart(uri, "unix:", NULL) || strstart(uri, "vsock:", NULL)) { @@ -2122,6 +2141,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", )) { fd_start_outgoing_migration(s, p, _err); } else { +if (!(has_resume && resume)) { +yank_unregister_instance(MIGRATION_YANK_INSTANCE); +} error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid m
[PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu
Hello Everyone, I finally found time again to work on this, so here is v9 with the new qmp api. We still need ACKs from NBD and chardev maintainers. Changes: v9: -rebase onto master -implemented new qmp api as proposed by Markus v8: -add Reviewed-by and Acked-by tags -rebase onto master -minor change to migration -convert to meson -change "Since:" to 5.2 -varios code style fixes (Markus Armbruster) -point to oob restrictions in comment to yank_register_function (Markus Armbruster) -improve qmp documentation (Markus Armbruster) -document oob suitability of qio_channel and io_shutdown (Markus Armbruster) v7: -yank_register_instance now returns error via Error **errp instead of aborting -dropped "chardev/char.c: Check for duplicate id before creating chardev" v6: -add Reviewed-by and Acked-by tags -rebase on master -lots of changes in nbd due to rebase -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) -fix a crash discovered by the newly added chardev test -fix the test itself v5: -move yank.c to util/ -move yank.h to include/qemu/ -add license to yank.h -use const char* -nbd: use atomic_store_release and atomic_load_aqcuire -io-channel: ensure thread-safety and document it -add myself as maintainer for yank v4: -fix build errors... v3: -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) -fix build errors -rewrite migration patch so it actually passes all tests v2: -don't touch io/ code anymore -always register yank functions -'yank' now takes a list of instances to yank -'query-yank' returns a list of yankable instances Overview: Hello Everyone, In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) to some other server and that server dies or hangs, qemu hangs too. These patches introduce the new 'yank' out-of-band qmp command to recover from these kinds of hangs. The different subsystems register callbacks which get executed with the yank command. For example the callback can shutdown() a socket. This is intended for the colo use-case, but it can be used for other things too of course. Regards, Lukas Straub Lukas Straub (8): Introduce yank feature block/nbd.c: Add yank feature chardev/char-socket.c: Add yank feature migration: Add yank feature io/channel-tls.c: make qio_channel_tls_shutdown thread-safe io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown MAINTAINERS: Add myself as maintainer for yank feature tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test MAINTAINERS | 6 + block/nbd.c | 154 ++-- chardev/char-socket.c | 35 ++ include/io/channel.h | 5 +- include/qemu/yank.h | 95 +++ io/channel-tls.c | 6 +- migration/channel.c | 13 +++ migration/migration.c | 25 migration/multifd.c | 10 ++ migration/qemu-file-channel.c | 7 ++ migration/savevm.c| 6 + qapi/misc.json| 106 + tests/test-char.c | 1 + util/meson.build | 1 + util/yank.c | 213 ++ 15 files changed, 619 insertions(+), 64 deletions(-) create mode 100644 include/qemu/yank.h create mode 100644 util/yank.c -- 2.20.1 pgp21L4FV4CgL.pgp Description: OpenPGP digital signature
Re: [PATCH v7 1/8] Introduce yank feature
On Thu, 27 Aug 2020 14:37:00 +0200 Markus Armbruster wrote: > I apologize for not reviewing this much earlier. > > Lukas Straub writes: > > > The yank feature allows to recover from hanging qemu by "yanking" > > at various parts. Other qemu systems can register themselves and > > multiple yank functions. Then all yank functions for selected > > instances can be called by the 'yank' out-of-band qmp command. > > Available instances can be queried by a 'query-yank' oob command. > > > > Signed-off-by: Lukas Straub > > Acked-by: Stefan Hajnoczi > > --- > ... > > diff --git a/qapi/misc.json b/qapi/misc.json > > index 9d32820dc1..0d6a8f20b7 100644 > > --- a/qapi/misc.json > > +++ b/qapi/misc.json > > @@ -1615,3 +1615,48 @@ > > ## > > { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' } > > > > +## > > +# @YankInstances: > > +# > > +# @instances: List of yank instances. > > +# > > +# Yank instances are named after the following schema: > > +# "blockdev:", "chardev:" and "migration" > > +# > > +# Since: 5.1 > > +## > > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } } > > I'm afraid this is a problematic QMP interface. > > By making YankInstances a struct, you keep the door open to adding more > members, which is good. > > But by making its 'instances' member a ['str'], you close the door to > using anything but a single string for the individual instances. Not so > good. > > The single string encodes information which QMP client will need to > parse from the string. We frown on that in QMP. Use QAPI complex types > capabilities for structured data. > > Could you use something like this instead? > > { 'enum': 'YankInstanceType', > 'data': { 'block-node', 'chardev', 'migration' } } > > { 'struct': 'YankInstanceBlockNode', > 'data': { 'node-name': 'str' } } > > { 'struct': 'YankInstanceChardev', > 'data' { 'label': 'str' } } > > { 'union': 'YankInstance', > 'base': { 'type': 'YankInstanceType' }, > 'discriminator': 'type', > 'data': { > 'block-node': 'YankInstanceBlockNode', > 'chardev': 'YankInstanceChardev' } } > > { 'command': 'yank', > 'data': { 'instances': ['YankInstance'] }, > 'allow-oob': true } This proposal looks good to me. Does everyone agree? Regards, Lukas Straub > If you're confident nothing will ever be added to YankInstanceBlockNode > and YankInstanceChardev, you could use str instead. > > > + > > +## > > +# @yank: > > +# > > +# Recover from hanging qemu by yanking the specified instances. > > What's an "instance", and what does it mean to "yank" it? > > The documentation of YankInstances above gives a clue on what an > "instance" is: presumably a block node, a character device or the > migration job. > > I guess a YankInstance is whatever the code chooses to make one, and the > current code makes these three kinds. > > Does it make every block node a YankInstance? If not, which ones? > > Does it make every character device a YankInstance? If not, which ones? > > Does it make migration always a YankInstance? If not, when? > > > +# > > +# Takes @YankInstances as argument. > > +# > > +# Returns: nothing. > > +# > > +# Example: > > +# > > +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } } > > +# <- { "return": {} } > > +# > > +# Since: 5.1 > > +## > > +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true } > > + > > +## > > +# @query-yank: > > +# > > +# Query yank instances. > > +# > > +# Returns: @YankInstances > > +# > > +# Example: > > +# > > +# -> { "execute": "query-yank" } > > +# <- { "return": { "instances": ["blockdev:nbd0"] } } > > +# > > +# Since: 5.1 > > +## > > +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true } > ... pgpCnC6MLDYNR.pgp Description: OpenPGP digital signature
[PATCH v8 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- io/channel-tls.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io/channel-tls.c b/io/channel-tls.c index 7ec8ceff2f..b350c84640 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -23,6 +23,7 @@ #include "qemu/module.h" #include "io/channel-tls.h" #include "trace.h" +#include "qemu/atomic.h" static ssize_t qio_channel_tls_write_handler(const char *buf, @@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, return QIO_CHANNEL_ERR_BLOCK; } } else if (errno == ECONNABORTED && - (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) { + (atomic_load_acquire(>shutdown) & +QIO_CHANNEL_SHUTDOWN_READ)) { return 0; } @@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); -tioc->shutdown |= how; +atomic_or(>shutdown, how); return qio_channel_shutdown(tioc->master, how, errp); } -- 2.20.1 pgp3TnxMdPuJ_.pgp Description: OpenPGP digital signature
[PATCH v8 7/8] MAINTAINERS: Add myself as maintainer for yank feature
I'll maintain this for now as the colo usecase is the first user of this functionality. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5a22c8be42..c1d450e25a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2615,6 +2615,12 @@ F: util/uuid.c F: include/qemu/uuid.h F: tests/test-uuid.c +Yank feature +M: Lukas Straub +S: Odd fixes +F: util/yank.c +F: include/qemu/yank.h + COLO Framework M: zhanghailiang S: Maintained -- 2.20.1 pgpo5JnRMLSNT.pgp Description: OpenPGP digital signature
[PATCH v8 3/8] chardev/char-socket.c: Add yank feature
Register a yank function to shutdown the socket on yank. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- chardev/char-socket.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index ef62dbf3d7..8e2865ca83 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -34,6 +34,7 @@ #include "qapi/error.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-visit-sockets.h" +#include "qemu/yank.h" #include "chardev/char-io.h" @@ -69,6 +70,7 @@ typedef struct { size_t read_msgfds_num; int *write_msgfds; size_t write_msgfds_num; +char *yank_name; SocketAddress *addr; bool is_listen; @@ -413,6 +415,11 @@ static void tcp_chr_free_connection(Chardev *chr) tcp_set_msgfds(chr, NULL, 0); remove_fd_in_watch(chr); +if (s->state == TCP_CHARDEV_STATE_CONNECTING +|| s->state == TCP_CHARDEV_STATE_CONNECTED) { +yank_unregister_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(s->sioc)); +} object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); @@ -916,6 +923,8 @@ static int tcp_chr_add_client(Chardev *chr, int fd) } tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(sioc)); ret = tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return ret; @@ -930,6 +939,8 @@ static void tcp_chr_accept(QIONetListener *listener, tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); tcp_chr_set_client_ioc_name(chr, cioc); +yank_register_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(cioc)); tcp_chr_new_client(chr, cioc); } @@ -945,6 +956,8 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp) object_unref(OBJECT(sioc)); return -1; } +yank_register_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); return 0; @@ -960,6 +973,8 @@ static void tcp_chr_accept_server_sync(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_net_listener_wait_client(s->listener); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(sioc)); tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } @@ -1070,6 +1085,10 @@ static void char_socket_finalize(Object *obj) object_unref(OBJECT(s->tls_creds)); } g_free(s->tls_authz); +if (s->yank_name) { +yank_unregister_instance(s->yank_name); +g_free(s->yank_name); +} qemu_chr_be_event(chr, CHR_EVENT_CLOSED); } @@ -1085,6 +1104,8 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque) if (qio_task_propagate_error(task, )) { tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); +yank_unregister_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(sioc)); check_report_connect_error(chr, err); goto cleanup; } @@ -1118,6 +1139,8 @@ static void tcp_chr_connect_client_async(Chardev *chr) tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING); sioc = qio_channel_socket_new(); tcp_chr_set_client_ioc_name(chr, sioc); +yank_register_function(s->yank_name, yank_generic_iochannel, + QIO_CHANNEL(sioc)); /* * Normally code would use the qio_channel_socket_connect_async * method which uses a QIOTask + qio_task_set_error internally @@ -1360,6 +1383,14 @@ static void qmp_chardev_open_socket(Chardev *chr, qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS); } +s->yank_name = g_strconcat("chardev:", chr->label, NULL); +yank_register_instance(s->yank_name, errp); +if (*errp) { +g_free(s->yank_name); +s->yank_name = NULL; +return; +} + /* be isn't opened until we get a connection */ *be_opened = false; -- 2.20.1 pgpMhRYCj1F3A.pgp Description: OpenPGP digital signature
[PATCH v8 2/8] block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé --- block/nbd.c | 129 1 file changed, 80 insertions(+), 49 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 7bb881fef4..8632cf5340 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -35,6 +35,7 @@ #include "qemu/option.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" +#include "qemu/atomic.h" #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" @@ -43,6 +44,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "qemu/yank.h" + #define EN_OPTSTR ":exportname=" #define MAX_NBD_REQUESTS16 @@ -84,6 +87,8 @@ typedef struct BDRVNBDState { NBDReply reply; BlockDriverState *bs; +char *yank_name; + /* Connection parameters */ uint32_t reconnect_delay; SocketAddress *saddr; @@ -93,10 +98,10 @@ typedef struct BDRVNBDState { char *x_dirty_bitmap; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, - Error **errp); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); +static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, +Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { @@ -109,17 +114,19 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s) s->tlscredsid = NULL; g_free(s->x_dirty_bitmap); s->x_dirty_bitmap = NULL; +g_free(s->yank_name); +s->yank_name = NULL; } static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (s->state == NBD_CLIENT_CONNECTED) { +if (atomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (s->state == NBD_CLIENT_CONNECTED) { +if (atomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -170,7 +177,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (s->state == NBD_CLIENT_CONNECTED) { +if (atomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -237,20 +244,20 @@ static void nbd_teardown_connection(BlockDriverState *bs) static bool nbd_client_connecting(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT || -s->state == NBD_CLIENT_CONNECTING_NOWAIT; +NBDClientState state = atomic_load_acquire(>state); +return state == NBD_CLIENT_CONNECTING_WAIT || +state == NBD_CLIENT_CONNECTING_NOWAIT; } static bool nbd_client_connecting_wait(BDRVNBDState *s) { -return s->state == NBD_CLIENT_CONNECTING_WAIT; +return atomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; Error *local_err = NULL; -QIOChannelSocket *sioc; if (!nbd_client_connecting(s)) { return; @@ -283,21 +290,21 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) /* Finalize previous connection if any */ if (s->ioc) { nbd_client_detach_aio_context(s->bs); +yank_unregister_function(s->yank_name, nbd_yank, s->bs); object_unref(OBJECT(s->sioc)); s->sioc = NULL; object_unref(OBJECT(s->ioc)); s->ioc = NULL; } -sioc = nbd_establish_connection(s->saddr, _err); -if (!sioc) { +if (nbd_establish_connection(s->bs, s->saddr, _err) < 0) { ret = -ECONNREFUSED; goto out; } bdrv_dec_in_flight(s->bs); -ret = nbd_client_handshake(s->bs, sioc, _err); +ret = nbd_client_handshake(s->bs, _err); if (s->drained) { s->wait_drained_end = true; @@ -334,7 +341,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s) nbd_reconnect_attempt(s); while (nbd_client_connecting(s)) { -if (s->state == NBD_CLIENT_CONNECTING_WAIT && +if (atomic_load_acquire(>state) == NBD_CLIENT_
[PATCH v8 1/8] Introduce yank feature
The yank feature allows to recover from hanging qemu by "yanking" at various parts. Other qemu systems can register themselves and multiple yank functions. Then all yank functions for selected instances can be called by the 'yank' out-of-band qmp command. Available instances can be queried by a 'query-yank' oob command. Signed-off-by: Lukas Straub Acked-by: Stefan Hajnoczi --- include/qemu/yank.h | 81 +++ qapi/misc.json | 62 +++ util/meson.build| 1 + util/yank.c | 187 4 files changed, 331 insertions(+) create mode 100644 include/qemu/yank.h create mode 100644 util/yank.c diff --git a/include/qemu/yank.h b/include/qemu/yank.h new file mode 100644 index 00..c5ab53965a --- /dev/null +++ b/include/qemu/yank.h @@ -0,0 +1,81 @@ +/* + * QEMU yank feature + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef YANK_H +#define YANK_H + +typedef void (YankFn)(void *opaque); + +/** + * yank_register_instance: Register a new instance. + * + * This registers a new instance for yanking. Must be called before any yank + * function is registered for this instance. + * + * This function is thread-safe. + * + * @instance_name: The globally unique name of the instance. + * @errp: Error object. + */ +void yank_register_instance(const char *instance_name, Error **errp); + +/** + * yank_unregister_instance: Unregister a instance. + * + * This unregisters a instance. Must be called only after every yank function + * of the instance has been unregistered. + * + * This function is thread-safe. + * + * @instance_name: The name of the instance. + */ +void yank_unregister_instance(const char *instance_name); + +/** + * yank_register_function: Register a yank function + * + * This registers a yank function. All limitations of qmp oob commands apply + * to the yank function as well. See docs/devel/qapi-code-gen.txt under + * "An OOB-capable command handler must satisfy the following conditions". + * + * This function is thread-safe. + * + * @instance_name: The name of the instance + * @func: The yank function + * @opaque: Will be passed to the yank function + */ +void yank_register_function(const char *instance_name, +YankFn *func, +void *opaque); + +/** + * yank_unregister_function: Unregister a yank function + * + * This unregisters a yank function. + * + * This function is thread-safe. + * + * @instance_name: The name of the instance + * @func: func that was passed to yank_register_function + * @opaque: opaque that was passed to yank_register_function + */ +void yank_unregister_function(const char *instance_name, + YankFn *func, + void *opaque); + +/** + * yank_generic_iochannel: Generic yank function for iochannel + * + * This is a generic yank function which will call qio_channel_shutdown on the + * provided QIOChannel. + * + * @opaque: QIOChannel to shutdown + */ +void yank_generic_iochannel(void *opaque); +#endif diff --git a/qapi/misc.json b/qapi/misc.json index 9d32820dc1..7de330416a 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1615,3 +1615,65 @@ ## { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' } +## +# @YankInstances: +# +# @instances: List of yank instances. +# +# A yank instance can be yanked with the "yank" qmp command to recover from a +# hanging qemu. +# +# Yank instances are named after the following schema: +# "blockdev:" refers to a block device. Currently only nbd block +# devices are implemented. +# "chardev:" refers to a chardev. Currently only socket chardevs +# are implemented. +# "migration" refers to the migration currently in progress. +# +# Currently implemented yank instances: +# -nbd block device: +# Yanking it will shutdown the connection to the nbd server without +# attempting to reconnect. +# -socket chardev: +# Yanking it will shutdown the connected socket. +# -migration: +# Yanking it will shutdown all migration connections. +# +# Since: 5.2 +## +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } } + +## +# @yank: +# +# Recover from hanging qemu by yanking the specified instances. See +# "YankInstances" for more information. +# +# Takes @YankInstances as argument. +# +# Returns: nothing. +# +# Example: +# +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } } +# <- { "return": {} } +# +# Since: 5.2 +## +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true } + +## +# @query-yank: +# +# Query yank instances. See "YankInstances" for more information. +# +# Returns: @YankInstances +# +# Example: +# +# -> { "execute"