from:"Dr. David Alan Gilbert \(git\)"

[PATCH] MAINTAINERS: Remove and change David Gilbert maintainer entries

2023-03-30 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

I'm leaving Red Hat next week, so clean up the maintainer entries.

'virtiofs' is just the device code now, so is pretty small, and
Stefan is still a maintainer there.

'migration' still has Juan.

For 'HMP' I'll swing that over to my personal email.

Signed-off-by: Dr. David Alan Gilbert 
---
 MAINTAINERS | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ef45b5e71e..f0f7fb3746 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2119,7 +2119,6 @@ T: git https://github.com/borntraeger/qemu.git s390-next
 L: qemu-s3...@nongnu.org
 
 virtiofs
-M: Dr. David Alan Gilbert 
 M: Stefan Hajnoczi 
 S: Supported
 F: hw/virtio/vhost-user-fs*
@@ -2863,7 +2862,7 @@ F: tests/unit/test-rcu-*.c
 F: util/rcu.c
 
 Human Monitor (HMP)
-M: Dr. David Alan Gilbert 
+M: Dr. David Alan Gilbert 
 S: Maintained
 F: monitor/monitor-internal.h
 F: monitor/misc.c
@@ -3136,7 +3135,6 @@ F: scripts/checkpatch.pl
 
 Migration
 M: Juan Quintela 
-M: Dr. David Alan Gilbert 
 S: Maintained
 F: hw/core/vmstate-if.c
 F: include/hw/vmstate-if.h
-- 
2.39.2

[PATCH] migration/rdma: Fix return-path case

2023-03-14 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The RDMA code has return-path handling code, but it's only enabled
if postcopy is enabled; if the 'return-path' migration capability
is enabled, the return path is NOT setup but the core migration
code still tries to use it and breaks.

Enable the RDMA return path if either postcopy or the return-path
capability is enabled.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 288eadc2d2..9d70e9885b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3373,7 +3373,8 @@ static int qemu_rdma_accept(RDMAContext *rdma)
  * initialize the RDMAContext for return path for postcopy after first
  * connection request reached.
  */
-if (migrate_postcopy() && !rdma->is_return_path) {
+if ((migrate_postcopy() || migrate_use_return_path())
+&& !rdma->is_return_path) {
 rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
 if (rdma_return_path == NULL) {
 rdma_ack_cm_event(cm_event);
@@ -3455,7 +3456,8 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 }
 
 /* Accept the second connection request for return path */
-if (migrate_postcopy() && !rdma->is_return_path) {
+if ((migrate_postcopy() || migrate_use_return_path())
+&& !rdma->is_return_path) {
 qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
 NULL,
 (void *)(intptr_t)rdma->return_path);
@@ -4192,7 +4194,7 @@ void rdma_start_outgoing_migration(void *opaque,
 }
 
 /* RDMA postcopy need a separate queue pair for return path */
-if (migrate_postcopy()) {
+if (migrate_postcopy() || migrate_use_return_path()) {
 rdma_return_path = qemu_rdma_data_init(host_port, errp);
 
 if (rdma_return_path == NULL) {
-- 
2.39.2

[PATCH] tests/migration: Tweek auto converge limits check

2023-03-06 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Thomas found an autoconverge test failure where the
migration completed before the autoconverge had kicked in.

To try and avoid this again:
  a) Reduce the usleep in test_migrate_auto_converge
so that it should exit quicker when autoconverge kicks in
  b) Make the loop exit immediately rather than have the sleep
 when it does start autoconverge, otherwise the autoconverge
 might succeed during the sleep.
  c) Reduce inc_pct so auto converge happens more slowly
  d) Reduce the max-bandwidth in migrate_ensure_non_converge
to make the ensure more ensure.

Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-test.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d4ab3934ed..75d4f1d4a9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -408,8 +408,8 @@ static void migrate_set_parameter_str(QTestState *who, 
const char *parameter,
 
 static void migrate_ensure_non_converge(QTestState *who)
 {
-/* Can't converge with 1ms downtime + 30 mbs bandwidth limit */
-migrate_set_parameter_int(who, "max-bandwidth", 30 * 1000 * 1000);
+/* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
+migrate_set_parameter_int(who, "max-bandwidth", 3 * 1000 * 1000);
 migrate_set_parameter_int(who, "downtime-limit", 1);
 }
 
@@ -1808,7 +1808,7 @@ static void test_migrate_auto_converge(void)
  * E.g., with 1Gb/s bandwith migration may pass without throttling,
  * so we need to decrease a bandwidth.
  */
-const int64_t init_pct = 5, inc_pct = 50, max_pct = 95;
+const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
 
 if (test_migrate_start(, , uri, )) {
 return;
@@ -1835,13 +1835,16 @@ static void test_migrate_auto_converge(void)
 
 /* Wait for throttling begins */
 percentage = 0;
-while (percentage == 0) {
+do {
 percentage = read_migrate_property_int(from, 
"cpu-throttle-percentage");
-usleep(100);
+if (percentage != 0) {
+break;
+}
+usleep(20);
 g_assert_false(got_stop);
-}
-/* The first percentage of throttling should be equal to init_pct */
-g_assert_cmpint(percentage, ==, init_pct);
+} while (true);
+/* The first percentage of throttling should be at least init_pct */
+g_assert_cmpint(percentage, >=, init_pct);
 /* Now, when we tested that throttling works, let it converge */
 migrate_ensure_converge(from);
 
-- 
2.39.2

[PULL 2/4] virtiofsd: Remove build and docs glue

2023-02-16 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Remove all the virtiofsd build and docs infrastructure.

Signed-off-by: Dr. David Alan Gilbert 
Acked-by: Stefan Hajnoczi 
---
 MAINTAINERS|  2 --
 docs/conf.py   |  4 
 docs/meson.build   |  1 -
 docs/tools/index.rst   |  1 -
 meson.build|  1 -
 meson_options.txt  |  2 --
 .../ci/org.centos/stream/8/x86_64/configure|  2 --
 scripts/coverity-scan/COMPONENTS.md|  3 ---
 scripts/meson-buildoptions.sh  |  3 ---
 tools/meson.build  | 13 -
 tools/virtiofsd/50-qemu-virtiofsd.json.in  |  5 -
 tools/virtiofsd/meson.build| 18 --
 12 files changed, 55 deletions(-)
 delete mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 delete mode 100644 tools/virtiofsd/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index fd54c1f140..5090ba0e49 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2096,10 +2096,8 @@ virtiofs
 M: Dr. David Alan Gilbert 
 M: Stefan Hajnoczi 
 S: Supported
-F: tools/virtiofsd/*
 F: hw/virtio/vhost-user-fs*
 F: include/hw/virtio/vhost-user-fs.h
-F: docs/tools/virtiofsd.rst
 L: virtio...@redhat.com
 
 virtio-input
diff --git a/docs/conf.py b/docs/conf.py
index 73a287a4f2..00767b0e24 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -290,10 +290,6 @@
 ('tools/virtfs-proxy-helper', 'virtfs-proxy-helper',
  'QEMU 9p virtfs proxy filesystem helper',
  ['M. Mohan Kumar'], 1),
-('tools/virtiofsd', 'virtiofsd',
- 'QEMU virtio-fs shared file system daemon',
- ['Stefan Hajnoczi ',
-  'Masayoshi Mizuma '], 1),
 ]
 man_make_section_directory = False
 
diff --git a/docs/meson.build b/docs/meson.build
index 9136fed3b7..bbcdccce68 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -48,7 +48,6 @@ if build_docs
 'qemu-storage-daemon.1': (have_tools ? 'man1' : ''),
 'qemu-trace-stap.1': (stap.found() ? 'man1' : ''),
 'virtfs-proxy-helper.1': (have_virtfs_proxy_helper ? 'man1' : ''),
-'virtiofsd.1': (have_virtiofsd ? 'man1' : ''),
 'qemu.1': 'man1',
 'qemu-block-drivers.7': 'man7',
 'qemu-cpu-models.7': 'man7'
diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index 2151adcf78..8e65ce0dfc 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -16,4 +16,3 @@ command line utilities and other standalone programs.
qemu-pr-helper
qemu-trace-stap
virtfs-proxy-helper
-   virtiofsd
diff --git a/meson.build b/meson.build
index a76c855312..adfc0e28b5 100644
--- a/meson.build
+++ b/meson.build
@@ -3879,7 +3879,6 @@ if have_block
   summary_info += {'Block whitelist (ro)': 
get_option('block_drv_ro_whitelist')}
   summary_info += {'Use block whitelist in tools': 
get_option('block_drv_whitelist_in_tools')}
   summary_info += {'VirtFS support':have_virtfs}
-  summary_info += {'build virtiofs daemon': have_virtiofsd}
   summary_info += {'Live block migration': 
config_host_data.get('CONFIG_LIVE_BLOCK_MIGRATION')}
   summary_info += {'replication support': 
config_host_data.get('CONFIG_REPLICATION')}
   summary_info += {'bochs support': get_option('bochs').allowed()}
diff --git a/meson_options.txt b/meson_options.txt
index 7e5801db90..6b0900205e 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -272,8 +272,6 @@ option('vhost_user_blk_server', type: 'feature', value: 
'auto',
description: 'build vhost-user-blk server')
 option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
-option('virtiofsd', type: 'feature', value: 'auto',
-   description: 'build virtiofs daemon (virtiofsd)')
 option('libvduse', type: 'feature', value: 'auto',
description: 'build VDUSE Library')
 option('vduse_blk_export', type: 'feature', value: 'auto',
diff --git a/scripts/ci/org.centos/stream/8/x86_64/configure 
b/scripts/ci/org.centos/stream/8/x86_64/configure
index 65eacf3c56..6e8983f39c 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/configure
+++ b/scripts/ci/org.centos/stream/8/x86_64/configure
@@ -138,7 +138,6 @@
 --disable-vhost-vdpa \
 --disable-virglrenderer \
 --disable-virtfs \
---disable-virtiofsd \
 --disable-vnc \
 --disable-vnc-jpeg \
 --disable-png \
@@ -191,7 +190,6 @@
 --enable-tpm \
 --enable-trace-backends=dtrace \
 --enable-usb-redir \
---enable-virtiofsd \
 --enable-vhost-kernel \
 --enable-vhost-net \
 --enable-vhost-user \
diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 0e6ab4936e..639dcee45a 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -132,9 +132,6 @@ util
 xen
   ~ (/qemu)?(.*/xen.*)
 
-virtiofsd
-  ~ (/qemu)?(/tools/virtiofsd/.*)
-
 (headers)
   ~ (/qemu)?(/include/.*)
 
diff --git a/scripts/meson-buildoptions.sh

[PULL 1/4] virtiofsd: Remove test

2023-02-16 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Rmove the avocado test for virtiofsd, since we're about to remove
the C implementation.

Signed-off-by: Dr. David Alan Gilbert 
Acked-by: Stefan Hajnoczi 
---
 .../org.centos/stream/8/x86_64/test-avocado   |   7 -
 tests/avocado/virtiofs_submounts.py   | 217 --
 2 files changed, 224 deletions(-)
 delete mode 100644 tests/avocado/virtiofs_submounts.py

diff --git a/scripts/ci/org.centos/stream/8/x86_64/test-avocado 
b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
index 7aeecbcfb8..f403e4e7ec 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/test-avocado
+++ b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
@@ -14,13 +14,6 @@
 # * Require machine type "x-remote":
 #   - tests/avocado/multiprocess.py:Multiprocess.test_multiprocess_x86_64
 #
-# * Needs superuser privileges:
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_pre_virtiofsd_set_up
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_pre_launch_set_up
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_post_launch_set_up
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_post_mount_set_up
-#   - tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_two_runs
-#
 # * Requires display type "egl-headless":
 #   - tests/avocado/virtio-gpu.py:VirtioGPUx86.test_virtio_vga_virgl
 #   - tests/avocado/virtio-gpu.py:VirtioGPUx86.test_vhost_user_vga_virgl
diff --git a/tests/avocado/virtiofs_submounts.py 
b/tests/avocado/virtiofs_submounts.py
deleted file mode 100644
index e6dc32ffd4..00
--- a/tests/avocado/virtiofs_submounts.py
+++ /dev/null
@@ -1,217 +0,0 @@
-import logging
-import re
-import os
-import subprocess
-import time
-
-from avocado import skipUnless
-from avocado_qemu import LinuxTest, BUILD_DIR
-from avocado_qemu import has_cmds
-from avocado_qemu import run_cmd
-from avocado_qemu import wait_for_console_pattern
-from avocado.utils import ssh
-
-
-class VirtiofsSubmountsTest(LinuxTest):
-"""
-:avocado: tags=arch:x86_64
-:avocado: tags=accel:kvm
-"""
-
-def run(self, args, ignore_error=False):
-stdout, stderr, ret = run_cmd(args)
-
-if ret != 0:
-cmdline = ' '.join(args)
-if not ignore_error:
-self.fail(f'{cmdline}: Returned {ret}: {stderr}')
-else:
-self.log.warn(f'{cmdline}: Returned {ret}: {stderr}')
-
-return (stdout, stderr, ret)
-
-def set_up_shared_dir(self):
-self.shared_dir = os.path.join(self.workdir, 'virtiofs-shared')
-
-os.mkdir(self.shared_dir)
-
-self.run(('cp', self.get_data('guest.sh'),
- os.path.join(self.shared_dir, 'check.sh')))
-
-self.run(('cp', self.get_data('guest-cleanup.sh'),
- os.path.join(self.shared_dir, 'cleanup.sh')))
-
-def set_up_virtiofs(self):
-attmp = os.getenv('AVOCADO_TESTS_COMMON_TMPDIR')
-self.vfsdsock = os.path.join(attmp, 'vfsdsock')
-
-self.run(('sudo', '-n', 'rm', '-f', self.vfsdsock), ignore_error=True)
-
-self.virtiofsd = \
-subprocess.Popen(('sudo', '-n',
-  'tools/virtiofsd/virtiofsd',
-  f'--socket-path={self.vfsdsock}',
-  '-o', f'source={self.shared_dir}',
-  '-o', 'cache=always',
-  '-o', 'xattr',
-  '-o', 'announce_submounts',
-  '-f'),
- stdout=subprocess.DEVNULL,
- stderr=subprocess.PIPE,
- universal_newlines=True)
-
-while not os.path.exists(self.vfsdsock):
-if self.virtiofsd.poll() is not None:
-self.fail('virtiofsd exited prematurely: ' +
-  self.virtiofsd.communicate()[1])
-time.sleep(0.1)
-
-self.run(('sudo', '-n', 'chmod', 'go+rw', self.vfsdsock))
-
-self.vm.add_args('-chardev',
- f'socket,id=vfsdsock,path={self.vfsdsock}',
- '-device',
- 'vhost-user-fs-pci,queue-size=1024,chardev=vfsdsock' \
- ',tag=host',
- '-object',
- 'memory-backend-file,id=mem,size=1G,' \
- 'mem-path=/dev/shm,share=on',
- '-numa',
- 'node,memdev=mem')
-
-def set_up_nested_mounts(self):
-scratch_dir = os.path.join(self.shared_dir, 'scratch')
-try:
-os.mkdir(scratch_dir)
-except FileExistsError:
-pass
-
-args = ['bash', self.get_data('host.sh'), scratch_dir]
-if self.seed:
-args += [self.seed]
-
-out, _, _ = self.run(args)
-seed = re.search(r'^Seed: \d+', out)

[PULL 4/4] virtiofsd: Swing deprecated message to removed-features

2023-02-16 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Move the deprecation message, since it's now gone.

Signed-off-by: Dr. David Alan Gilbert 
Acked-by: Stefan Hajnoczi 
---
 docs/about/deprecated.rst   | 18 --
 docs/about/removed-features.rst | 13 +
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 2827b0c0be..ee95bcb1a6 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -330,24 +330,6 @@ versions, aliases will point to newer CPU model versions
 depending on the machine type, so management software must
 resolve CPU model aliases before starting a virtual machine.
 
-Tools
--
-
-virtiofsd
-'
-
-There is a new Rust implementation of ``virtiofsd`` at
-``https://gitlab.com/virtio-fs/virtiofsd``;
-since this is now marked stable, new development should be done on that
-rather than the existing C version in the QEMU tree.
-The C version will still accept fixes and patches that
-are already in development for the moment, but will eventually
-be deleted from this tree.
-New deployments should use the Rust version, and existing systems
-should consider moving to it.  The command line and feature set
-is very close and moving should be simple.
-
-
 QEMU guest agent
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index e901637ce5..5b258b446b 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -889,3 +889,16 @@ The VXHS code did not compile since v2.12.0. It was 
removed in 5.1.
 The corresponding upstream server project is no longer maintained.
 Users are recommended to switch to an alternative distributed block
 device driver such as RBD.
+
+Tools
+-
+
+virtiofsd (removed in 8.0)
+''
+
+There is a newer Rust implementation of ``virtiofsd`` at
+``https://gitlab.com/virtio-fs/virtiofsd``; this has been
+stable for some time and is now widely used.
+The command line and feature set is very close to the removed
+C implementation.
+
-- 
2.39.2

[PULL 0/4] virtiofs queue

2023-02-16 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit 6dffbe36af79e26a4d23f94a9a1c1201de99c261:

  Merge tag 'migration-20230215-pull-request' of 
https://gitlab.com/juan.quintela/qemu into staging (2023-02-16 13:09:51 +)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20230216b

for you to fetch changes up to a6bfdaed4a735a2cf59f265e6955fe2adcc99637:

  virtiofsd: Swing deprecated message to removed-features (2023-02-16 18:15:08 
+)


Remove C virtiofsd

We deprecated the C virtiofsd in commit 34deee7b6a1418f3d62a
in v7.0 in favour of the Rust implementation at

  https://gitlab.com/virtio-fs/virtiofsd

since then, the Rust version has had more development and
has held up well.  It's time to say goodbye to the C version
that got us going.

Signed-off-by: Dr. David Alan Gilbert 


Dr. David Alan Gilbert (4):
  virtiofsd: Remove test
  virtiofsd: Remove build and docs glue
  virtiofsd: Remove source
  virtiofsd: Swing deprecated message to removed-features

 MAINTAINERS|2 -
 docs/about/deprecated.rst  |   18 -
 docs/about/removed-features.rst|   13 +
 docs/conf.py   |4 -
 docs/meson.build   |1 -
 docs/tools/index.rst   |1 -
 docs/tools/virtiofsd.rst   |  403 --
 meson.build|1 -
 meson_options.txt  |2 -
 scripts/ci/org.centos/stream/8/x86_64/configure|2 -
 scripts/ci/org.centos/stream/8/x86_64/test-avocado |7 -
 scripts/coverity-scan/COMPONENTS.md|3 -
 scripts/meson-buildoptions.sh  |3 -
 tests/avocado/virtiofs_submounts.py|  217 -
 tools/meson.build  |   13 -
 tools/virtiofsd/50-qemu-virtiofsd.json.in  |5 -
 tools/virtiofsd/buffer.c   |  350 --
 tools/virtiofsd/fuse_common.h  |  837 
 tools/virtiofsd/fuse_i.h   |  107 -
 tools/virtiofsd/fuse_log.c |   40 -
 tools/virtiofsd/fuse_log.h |   75 -
 tools/virtiofsd/fuse_lowlevel.c| 2732 
 tools/virtiofsd/fuse_lowlevel.h| 1988 -
 tools/virtiofsd/fuse_misc.h|   59 -
 tools/virtiofsd/fuse_opt.c |  446 --
 tools/virtiofsd/fuse_opt.h |  272 --
 tools/virtiofsd/fuse_signals.c |   93 -
 tools/virtiofsd/fuse_virtio.c  | 1081 -
 tools/virtiofsd/fuse_virtio.h  |   33 -
 tools/virtiofsd/helper.c   |  409 --
 tools/virtiofsd/meson.build|   18 -
 tools/virtiofsd/passthrough_helpers.h  |   51 -
 tools/virtiofsd/passthrough_ll.c   | 4521 
 tools/virtiofsd/passthrough_seccomp.c  |  182 -
 tools/virtiofsd/passthrough_seccomp.h  |   14 -
 35 files changed, 13 insertions(+), 13990 deletions(-)
 delete mode 100644 docs/tools/virtiofsd.rst
 delete mode 100644 tests/avocado/virtiofs_submounts.py
 delete mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 delete mode 100644 tools/virtiofsd/buffer.c
 delete mode 100644 tools/virtiofsd/fuse_common.h
 delete mode 100644 tools/virtiofsd/fuse_i.h
 delete mode 100644 tools/virtiofsd/fuse_log.c
 delete mode 100644 tools/virtiofsd/fuse_log.h
 delete mode 100644 tools/virtiofsd/fuse_lowlevel.c
 delete mode 100644 tools/virtiofsd/fuse_lowlevel.h
 delete mode 100644 tools/virtiofsd/fuse_misc.h
 delete mode 100644 tools/virtiofsd/fuse_opt.c
 delete mode 100644 tools/virtiofsd/fuse_opt.h
 delete mode 100644 tools/virtiofsd/fuse_signals.c
 delete mode 100644 tools/virtiofsd/fuse_virtio.c
 delete mode 100644 tools/virtiofsd/fuse_virtio.h
 delete mode 100644 tools/virtiofsd/helper.c
 delete mode 100644 tools/virtiofsd/meson.build
 delete mode 100644 tools/virtiofsd/passthrough_helpers.h
 delete mode 100644 tools/virtiofsd/passthrough_ll.c
 delete mode 100644 tools/virtiofsd/passthrough_seccomp.c
 delete mode 100644 tools/virtiofsd/passthrough_seccomp.h

[PATCH v2 1/4] virtiofsd: Remove test

2023-02-15 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Rmove the avocado test for virtiofsd, since we're about to remove
the C implementation.

Signed-off-by: Dr. David Alan Gilbert 
---
 .../org.centos/stream/8/x86_64/test-avocado   |   7 -
 tests/avocado/virtiofs_submounts.py   | 217 --
 2 files changed, 224 deletions(-)
 delete mode 100644 tests/avocado/virtiofs_submounts.py

diff --git a/scripts/ci/org.centos/stream/8/x86_64/test-avocado 
b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
index 7aeecbcfb8..f403e4e7ec 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/test-avocado
+++ b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
@@ -14,13 +14,6 @@
 # * Require machine type "x-remote":
 #   - tests/avocado/multiprocess.py:Multiprocess.test_multiprocess_x86_64
 #
-# * Needs superuser privileges:
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_pre_virtiofsd_set_up
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_pre_launch_set_up
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_post_launch_set_up
-#   - 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_post_mount_set_up
-#   - tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_two_runs
-#
 # * Requires display type "egl-headless":
 #   - tests/avocado/virtio-gpu.py:VirtioGPUx86.test_virtio_vga_virgl
 #   - tests/avocado/virtio-gpu.py:VirtioGPUx86.test_vhost_user_vga_virgl
diff --git a/tests/avocado/virtiofs_submounts.py 
b/tests/avocado/virtiofs_submounts.py
deleted file mode 100644
index e6dc32ffd4..00
--- a/tests/avocado/virtiofs_submounts.py
+++ /dev/null
@@ -1,217 +0,0 @@
-import logging
-import re
-import os
-import subprocess
-import time
-
-from avocado import skipUnless
-from avocado_qemu import LinuxTest, BUILD_DIR
-from avocado_qemu import has_cmds
-from avocado_qemu import run_cmd
-from avocado_qemu import wait_for_console_pattern
-from avocado.utils import ssh
-
-
-class VirtiofsSubmountsTest(LinuxTest):
-"""
-:avocado: tags=arch:x86_64
-:avocado: tags=accel:kvm
-"""
-
-def run(self, args, ignore_error=False):
-stdout, stderr, ret = run_cmd(args)
-
-if ret != 0:
-cmdline = ' '.join(args)
-if not ignore_error:
-self.fail(f'{cmdline}: Returned {ret}: {stderr}')
-else:
-self.log.warn(f'{cmdline}: Returned {ret}: {stderr}')
-
-return (stdout, stderr, ret)
-
-def set_up_shared_dir(self):
-self.shared_dir = os.path.join(self.workdir, 'virtiofs-shared')
-
-os.mkdir(self.shared_dir)
-
-self.run(('cp', self.get_data('guest.sh'),
- os.path.join(self.shared_dir, 'check.sh')))
-
-self.run(('cp', self.get_data('guest-cleanup.sh'),
- os.path.join(self.shared_dir, 'cleanup.sh')))
-
-def set_up_virtiofs(self):
-attmp = os.getenv('AVOCADO_TESTS_COMMON_TMPDIR')
-self.vfsdsock = os.path.join(attmp, 'vfsdsock')
-
-self.run(('sudo', '-n', 'rm', '-f', self.vfsdsock), ignore_error=True)
-
-self.virtiofsd = \
-subprocess.Popen(('sudo', '-n',
-  'tools/virtiofsd/virtiofsd',
-  f'--socket-path={self.vfsdsock}',
-  '-o', f'source={self.shared_dir}',
-  '-o', 'cache=always',
-  '-o', 'xattr',
-  '-o', 'announce_submounts',
-  '-f'),
- stdout=subprocess.DEVNULL,
- stderr=subprocess.PIPE,
- universal_newlines=True)
-
-while not os.path.exists(self.vfsdsock):
-if self.virtiofsd.poll() is not None:
-self.fail('virtiofsd exited prematurely: ' +
-  self.virtiofsd.communicate()[1])
-time.sleep(0.1)
-
-self.run(('sudo', '-n', 'chmod', 'go+rw', self.vfsdsock))
-
-self.vm.add_args('-chardev',
- f'socket,id=vfsdsock,path={self.vfsdsock}',
- '-device',
- 'vhost-user-fs-pci,queue-size=1024,chardev=vfsdsock' \
- ',tag=host',
- '-object',
- 'memory-backend-file,id=mem,size=1G,' \
- 'mem-path=/dev/shm,share=on',
- '-numa',
- 'node,memdev=mem')
-
-def set_up_nested_mounts(self):
-scratch_dir = os.path.join(self.shared_dir, 'scratch')
-try:
-os.mkdir(scratch_dir)
-except FileExistsError:
-pass
-
-args = ['bash', self.get_data('host.sh'), scratch_dir]
-if self.seed:
-args += [self.seed]
-
-out, _, _ = self.run(args)
-seed = re.search(r'^Seed: \d+', out)
-

[PATCH v2 0/4] Remove C virtiofsd

2023-02-15 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

We deprecated the C virtiofsd in commit 34deee7b6a1418f3d62a
in v7.0 in favour of the Rust implementation at
 
  https://gitlab.com/virtio-fs/virtiofsd
 
since then, the Rust version has had more development and
has held up well.  It's time to say goodbye to the C version
that got us going.

v2:
  After comments on the v1 series, I've removed the Avocado
test.

Dr. David Alan Gilbert (4):
  virtiofsd: Remove test
  virtiofsd: Remove build and docs glue
  virtiofsd: Remove source
  virtiofsd: Swing deprecated message to removed-features

 MAINTAINERS   |2 -
 docs/about/deprecated.rst |   18 -
 docs/about/removed-features.rst   |   13 +
 docs/conf.py  |4 -
 docs/meson.build  |1 -
 docs/tools/index.rst  |1 -
 docs/tools/virtiofsd.rst  |  403 --
 meson.build   |1 -
 meson_options.txt |2 -
 .../ci/org.centos/stream/8/x86_64/configure   |2 -
 .../org.centos/stream/8/x86_64/test-avocado   |7 -
 scripts/coverity-scan/COMPONENTS.md   |3 -
 scripts/meson-buildoptions.sh |3 -
 tests/avocado/virtiofs_submounts.py   |  217 -
 tools/meson.build |   13 -
 tools/virtiofsd/50-qemu-virtiofsd.json.in |5 -
 tools/virtiofsd/buffer.c  |  350 --
 tools/virtiofsd/fuse_common.h |  837 ---
 tools/virtiofsd/fuse_i.h  |  107 -
 tools/virtiofsd/fuse_log.c|   40 -
 tools/virtiofsd/fuse_log.h|   75 -
 tools/virtiofsd/fuse_lowlevel.c   | 2732 --
 tools/virtiofsd/fuse_lowlevel.h   | 1988 
 tools/virtiofsd/fuse_misc.h   |   59 -
 tools/virtiofsd/fuse_opt.c|  446 --
 tools/virtiofsd/fuse_opt.h|  272 -
 tools/virtiofsd/fuse_signals.c|   93 -
 tools/virtiofsd/fuse_virtio.c | 1081 
 tools/virtiofsd/fuse_virtio.h |   33 -
 tools/virtiofsd/helper.c  |  409 --
 tools/virtiofsd/meson.build   |   18 -
 tools/virtiofsd/passthrough_helpers.h |   51 -
 tools/virtiofsd/passthrough_ll.c  | 4521 -
 tools/virtiofsd/passthrough_seccomp.c |  182 -
 tools/virtiofsd/passthrough_seccomp.h |   14 -
 35 files changed, 13 insertions(+), 13990 deletions(-)
 delete mode 100644 docs/tools/virtiofsd.rst
 delete mode 100644 tests/avocado/virtiofs_submounts.py
 delete mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 delete mode 100644 tools/virtiofsd/buffer.c
 delete mode 100644 tools/virtiofsd/fuse_common.h
 delete mode 100644 tools/virtiofsd/fuse_i.h
 delete mode 100644 tools/virtiofsd/fuse_log.c
 delete mode 100644 tools/virtiofsd/fuse_log.h
 delete mode 100644 tools/virtiofsd/fuse_lowlevel.c
 delete mode 100644 tools/virtiofsd/fuse_lowlevel.h
 delete mode 100644 tools/virtiofsd/fuse_misc.h
 delete mode 100644 tools/virtiofsd/fuse_opt.c
 delete mode 100644 tools/virtiofsd/fuse_opt.h
 delete mode 100644 tools/virtiofsd/fuse_signals.c
 delete mode 100644 tools/virtiofsd/fuse_virtio.c
 delete mode 100644 tools/virtiofsd/fuse_virtio.h
 delete mode 100644 tools/virtiofsd/helper.c
 delete mode 100644 tools/virtiofsd/meson.build
 delete mode 100644 tools/virtiofsd/passthrough_helpers.h
 delete mode 100644 tools/virtiofsd/passthrough_ll.c
 delete mode 100644 tools/virtiofsd/passthrough_seccomp.c
 delete mode 100644 tools/virtiofsd/passthrough_seccomp.h

-- 
2.39.1

[PATCH v2 2/4] virtiofsd: Remove build and docs glue

2023-02-15 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Remove all the virtiofsd build and docs infrastructure.

Signed-off-by: Dr. David Alan Gilbert 
---
 MAINTAINERS|  2 --
 docs/conf.py   |  4 
 docs/meson.build   |  1 -
 docs/tools/index.rst   |  1 -
 meson.build|  1 -
 meson_options.txt  |  2 --
 .../ci/org.centos/stream/8/x86_64/configure|  2 --
 scripts/coverity-scan/COMPONENTS.md|  3 ---
 scripts/meson-buildoptions.sh  |  3 ---
 tools/meson.build  | 13 -
 tools/virtiofsd/50-qemu-virtiofsd.json.in  |  5 -
 tools/virtiofsd/meson.build| 18 --
 12 files changed, 55 deletions(-)
 delete mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 delete mode 100644 tools/virtiofsd/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 96e25f62ac..5c926e7396 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2097,10 +2097,8 @@ virtiofs
 M: Dr. David Alan Gilbert 
 M: Stefan Hajnoczi 
 S: Supported
-F: tools/virtiofsd/*
 F: hw/virtio/vhost-user-fs*
 F: include/hw/virtio/vhost-user-fs.h
-F: docs/tools/virtiofsd.rst
 L: virtio...@redhat.com
 
 virtio-input
diff --git a/docs/conf.py b/docs/conf.py
index 73a287a4f2..00767b0e24 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -290,10 +290,6 @@
 ('tools/virtfs-proxy-helper', 'virtfs-proxy-helper',
  'QEMU 9p virtfs proxy filesystem helper',
  ['M. Mohan Kumar'], 1),
-('tools/virtiofsd', 'virtiofsd',
- 'QEMU virtio-fs shared file system daemon',
- ['Stefan Hajnoczi ',
-  'Masayoshi Mizuma '], 1),
 ]
 man_make_section_directory = False
 
diff --git a/docs/meson.build b/docs/meson.build
index 9136fed3b7..bbcdccce68 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -48,7 +48,6 @@ if build_docs
 'qemu-storage-daemon.1': (have_tools ? 'man1' : ''),
 'qemu-trace-stap.1': (stap.found() ? 'man1' : ''),
 'virtfs-proxy-helper.1': (have_virtfs_proxy_helper ? 'man1' : ''),
-'virtiofsd.1': (have_virtiofsd ? 'man1' : ''),
 'qemu.1': 'man1',
 'qemu-block-drivers.7': 'man7',
 'qemu-cpu-models.7': 'man7'
diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index 2151adcf78..8e65ce0dfc 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -16,4 +16,3 @@ command line utilities and other standalone programs.
qemu-pr-helper
qemu-trace-stap
virtfs-proxy-helper
-   virtiofsd
diff --git a/meson.build b/meson.build
index c626ccfa82..6508b10a05 100644
--- a/meson.build
+++ b/meson.build
@@ -3870,7 +3870,6 @@ if have_block
   summary_info += {'Block whitelist (ro)': 
get_option('block_drv_ro_whitelist')}
   summary_info += {'Use block whitelist in tools': 
get_option('block_drv_whitelist_in_tools')}
   summary_info += {'VirtFS support':have_virtfs}
-  summary_info += {'build virtiofs daemon': have_virtiofsd}
   summary_info += {'Live block migration': 
config_host_data.get('CONFIG_LIVE_BLOCK_MIGRATION')}
   summary_info += {'replication support': 
config_host_data.get('CONFIG_REPLICATION')}
   summary_info += {'bochs support': get_option('bochs').allowed()}
diff --git a/meson_options.txt b/meson_options.txt
index e5f199119e..954abb859b 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -270,8 +270,6 @@ option('vhost_user_blk_server', type: 'feature', value: 
'auto',
description: 'build vhost-user-blk server')
 option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
-option('virtiofsd', type: 'feature', value: 'auto',
-   description: 'build virtiofs daemon (virtiofsd)')
 option('libvduse', type: 'feature', value: 'auto',
description: 'build VDUSE Library')
 option('vduse_blk_export', type: 'feature', value: 'auto',
diff --git a/scripts/ci/org.centos/stream/8/x86_64/configure 
b/scripts/ci/org.centos/stream/8/x86_64/configure
index 65eacf3c56..6e8983f39c 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/configure
+++ b/scripts/ci/org.centos/stream/8/x86_64/configure
@@ -138,7 +138,6 @@
 --disable-vhost-vdpa \
 --disable-virglrenderer \
 --disable-virtfs \
---disable-virtiofsd \
 --disable-vnc \
 --disable-vnc-jpeg \
 --disable-png \
@@ -191,7 +190,6 @@
 --enable-tpm \
 --enable-trace-backends=dtrace \
 --enable-usb-redir \
---enable-virtiofsd \
 --enable-vhost-kernel \
 --enable-vhost-net \
 --enable-vhost-user \
diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 0e6ab4936e..639dcee45a 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -132,9 +132,6 @@ util
 xen
   ~ (/qemu)?(.*/xen.*)
 
-virtiofsd
-  ~ (/qemu)?(/tools/virtiofsd/.*)
-
 (headers)
   ~ (/qemu)?(/include/.*)
 
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh

[PATCH v2 4/4] virtiofsd: Swing deprecated message to removed-features

2023-02-15 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Move the deprecation message, since it's now gone.

Signed-off-by: Dr. David Alan Gilbert 
---
 docs/about/deprecated.rst   | 18 --
 docs/about/removed-features.rst | 13 +
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index da2e6fe63d..9a749b342c 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -325,24 +325,6 @@ versions, aliases will point to newer CPU model versions
 depending on the machine type, so management software must
 resolve CPU model aliases before starting a virtual machine.
 
-Tools
--
-
-virtiofsd
-'
-
-There is a new Rust implementation of ``virtiofsd`` at
-``https://gitlab.com/virtio-fs/virtiofsd``;
-since this is now marked stable, new development should be done on that
-rather than the existing C version in the QEMU tree.
-The C version will still accept fixes and patches that
-are already in development for the moment, but will eventually
-be deleted from this tree.
-New deployments should use the Rust version, and existing systems
-should consider moving to it.  The command line and feature set
-is very close and moving should be simple.
-
-
 QEMU guest agent
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index a17d0554d6..8b69ab1674 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -872,3 +872,16 @@ The VXHS code did not compile since v2.12.0. It was 
removed in 5.1.
 The corresponding upstream server project is no longer maintained.
 Users are recommended to switch to an alternative distributed block
 device driver such as RBD.
+
+Tools
+-
+
+virtiofsd (removed in 8.0)
+''
+
+There is a newer Rust implementation of ``virtiofsd`` at
+``https://gitlab.com/virtio-fs/virtiofsd``; this has been
+stable for some time and is now widely used.
+The command line and feature set is very close to the removed
+C implementation.
+
-- 
2.39.1

[PATCH] virtio-rng-pci: fix transitional migration compat for vectors

2023-02-07 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

In bad9c5a5166fd5e3a892b7b0477cf2f4bd3a959a I fixed the virito-rng-pci
migration compatibility, but it was discovered that we also need to fix
the other aliases of the device for the transitional cases.

Fixes: 9ea02e8f1 ('virtio-rng-pci: Allow setting nvectors, so we can use MSI-X')
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2162569
Signed-off-by: Dr. David Alan Gilbert 
---
 hw/core/machine.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index b5cd42cd8c..4627b274d9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -49,6 +49,8 @@ const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 GlobalProperty hw_compat_7_1[] = {
 { "virtio-device", "queue_reset", "false" },
 { "virtio-rng-pci", "vectors", "0" },
+{ "virtio-rng-pci-transitional", "vectors", "0" },
+{ "virtio-rng-pci-non-transitional", "vectors", "0" },
 };
 const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
 
-- 
2.39.1

[PATCH 0/3] Remove C virtiofsd

2023-01-18 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

We deprecated the C virtiofsd in commit 34deee7b6a1418f3d62a
in v7.0 in favour of the Rust implementation at

  https://gitlab.com/virtio-fs/virtiofsd

since then, the Rust version has had more development and
has held up well.  It's time to say goodbye to the C version
that got us going.

The only thing I've not cleaned up here is
  tests/avocado/virtiofs_submounts.py

which I guess needs to figure out where the virtiofsd implementation
is and use it; suggestions welcome.

Dave


Dr. David Alan Gilbert (3):
  virtiofsd: Remove build and docs glue
  virtiofsd: Remove source
  virtiofsd: Swing deprecated message to removed-features

 MAINTAINERS   |2 -
 docs/about/deprecated.rst |   18 -
 docs/about/removed-features.rst   |   13 +
 docs/conf.py  |4 -
 docs/meson.build  |1 -
 docs/tools/index.rst  |1 -
 docs/tools/virtiofsd.rst  |  403 --
 meson.build   |1 -
 meson_options.txt |2 -
 .../ci/org.centos/stream/8/x86_64/configure   |2 -
 scripts/coverity-scan/COMPONENTS.md   |3 -
 scripts/meson-buildoptions.sh |3 -
 tools/meson.build |   13 -
 tools/virtiofsd/50-qemu-virtiofsd.json.in |5 -
 tools/virtiofsd/buffer.c  |  350 --
 tools/virtiofsd/fuse_common.h |  837 ---
 tools/virtiofsd/fuse_i.h  |  107 -
 tools/virtiofsd/fuse_log.c|   40 -
 tools/virtiofsd/fuse_log.h|   75 -
 tools/virtiofsd/fuse_lowlevel.c   | 2732 --
 tools/virtiofsd/fuse_lowlevel.h   | 1988 
 tools/virtiofsd/fuse_misc.h   |   59 -
 tools/virtiofsd/fuse_opt.c|  446 --
 tools/virtiofsd/fuse_opt.h|  272 -
 tools/virtiofsd/fuse_signals.c|   93 -
 tools/virtiofsd/fuse_virtio.c | 1081 
 tools/virtiofsd/fuse_virtio.h |   33 -
 tools/virtiofsd/helper.c  |  409 --
 tools/virtiofsd/meson.build   |   18 -
 tools/virtiofsd/passthrough_helpers.h |   51 -
 tools/virtiofsd/passthrough_ll.c  | 4521 -
 tools/virtiofsd/passthrough_seccomp.c |  182 -
 tools/virtiofsd/passthrough_seccomp.h |   14 -
 33 files changed, 13 insertions(+), 13766 deletions(-)
 delete mode 100644 docs/tools/virtiofsd.rst
 delete mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 delete mode 100644 tools/virtiofsd/buffer.c
 delete mode 100644 tools/virtiofsd/fuse_common.h
 delete mode 100644 tools/virtiofsd/fuse_i.h
 delete mode 100644 tools/virtiofsd/fuse_log.c
 delete mode 100644 tools/virtiofsd/fuse_log.h
 delete mode 100644 tools/virtiofsd/fuse_lowlevel.c
 delete mode 100644 tools/virtiofsd/fuse_lowlevel.h
 delete mode 100644 tools/virtiofsd/fuse_misc.h
 delete mode 100644 tools/virtiofsd/fuse_opt.c
 delete mode 100644 tools/virtiofsd/fuse_opt.h
 delete mode 100644 tools/virtiofsd/fuse_signals.c
 delete mode 100644 tools/virtiofsd/fuse_virtio.c
 delete mode 100644 tools/virtiofsd/fuse_virtio.h
 delete mode 100644 tools/virtiofsd/helper.c
 delete mode 100644 tools/virtiofsd/meson.build
 delete mode 100644 tools/virtiofsd/passthrough_helpers.h
 delete mode 100644 tools/virtiofsd/passthrough_ll.c
 delete mode 100644 tools/virtiofsd/passthrough_seccomp.c
 delete mode 100644 tools/virtiofsd/passthrough_seccomp.h

-- 
2.39.0

[PATCH 1/3] virtiofsd: Remove build and docs glue

2023-01-18 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Remove all the virtiofsd build and docs infrastructure.

Signed-off-by: Dr. David Alan Gilbert 
---
 MAINTAINERS|  2 --
 docs/conf.py   |  4 
 docs/meson.build   |  1 -
 docs/tools/index.rst   |  1 -
 meson.build|  1 -
 meson_options.txt  |  2 --
 .../ci/org.centos/stream/8/x86_64/configure|  2 --
 scripts/coverity-scan/COMPONENTS.md|  3 ---
 scripts/meson-buildoptions.sh  |  3 ---
 tools/meson.build  | 13 -
 tools/virtiofsd/50-qemu-virtiofsd.json.in  |  5 -
 tools/virtiofsd/meson.build| 18 --
 12 files changed, 55 deletions(-)
 delete mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 delete mode 100644 tools/virtiofsd/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fe50d01e3..4f8ab04dba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2093,10 +2093,8 @@ virtiofs
 M: Dr. David Alan Gilbert 
 M: Stefan Hajnoczi 
 S: Supported
-F: tools/virtiofsd/*
 F: hw/virtio/vhost-user-fs*
 F: include/hw/virtio/vhost-user-fs.h
-F: docs/tools/virtiofsd.rst
 L: virtio...@redhat.com
 
 virtio-input
diff --git a/docs/conf.py b/docs/conf.py
index e33cf3d381..b2b4c166e1 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -290,10 +290,6 @@
 ('tools/virtfs-proxy-helper', 'virtfs-proxy-helper',
  'QEMU 9p virtfs proxy filesystem helper',
  ['M. Mohan Kumar'], 1),
-('tools/virtiofsd', 'virtiofsd',
- 'QEMU virtio-fs shared file system daemon',
- ['Stefan Hajnoczi ',
-  'Masayoshi Mizuma '], 1),
 ]
 man_make_section_directory = False
 
diff --git a/docs/meson.build b/docs/meson.build
index 9136fed3b7..bbcdccce68 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -48,7 +48,6 @@ if build_docs
 'qemu-storage-daemon.1': (have_tools ? 'man1' : ''),
 'qemu-trace-stap.1': (stap.found() ? 'man1' : ''),
 'virtfs-proxy-helper.1': (have_virtfs_proxy_helper ? 'man1' : ''),
-'virtiofsd.1': (have_virtiofsd ? 'man1' : ''),
 'qemu.1': 'man1',
 'qemu-block-drivers.7': 'man7',
 'qemu-cpu-models.7': 'man7'
diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index 1edd5a8054..641550111c 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -14,4 +14,3 @@ command line utilities and other standalone programs.
qemu-pr-helper
qemu-trace-stap
virtfs-proxy-helper
-   virtiofsd
diff --git a/meson.build b/meson.build
index 58d8cd68a6..2f1bf88c9a 100644
--- a/meson.build
+++ b/meson.build
@@ -3860,7 +3860,6 @@ if have_block
   summary_info += {'Block whitelist (ro)': 
get_option('block_drv_ro_whitelist')}
   summary_info += {'Use block whitelist in tools': 
get_option('block_drv_whitelist_in_tools')}
   summary_info += {'VirtFS support':have_virtfs}
-  summary_info += {'build virtiofs daemon': have_virtiofsd}
   summary_info += {'Live block migration': 
config_host_data.get('CONFIG_LIVE_BLOCK_MIGRATION')}
   summary_info += {'replication support': 
config_host_data.get('CONFIG_REPLICATION')}
   summary_info += {'bochs support': get_option('bochs').allowed()}
diff --git a/meson_options.txt b/meson_options.txt
index 559a571b6b..0c9666437c 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -268,8 +268,6 @@ option('vhost_user_blk_server', type: 'feature', value: 
'auto',
description: 'build vhost-user-blk server')
 option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
-option('virtiofsd', type: 'feature', value: 'auto',
-   description: 'build virtiofs daemon (virtiofsd)')
 option('libvduse', type: 'feature', value: 'auto',
description: 'build VDUSE Library')
 option('vduse_blk_export', type: 'feature', value: 'auto',
diff --git a/scripts/ci/org.centos/stream/8/x86_64/configure 
b/scripts/ci/org.centos/stream/8/x86_64/configure
index 75882faa9c..54e9043674 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/configure
+++ b/scripts/ci/org.centos/stream/8/x86_64/configure
@@ -137,7 +137,6 @@
 --disable-vhost-vdpa \
 --disable-virglrenderer \
 --disable-virtfs \
---disable-virtiofsd \
 --disable-vnc \
 --disable-vnc-jpeg \
 --disable-png \
@@ -190,7 +189,6 @@
 --enable-tpm \
 --enable-trace-backends=dtrace \
 --enable-usb-redir \
---enable-virtiofsd \
 --enable-vhost-kernel \
 --enable-vhost-net \
 --enable-vhost-user \
diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 0e6ab4936e..639dcee45a 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -132,9 +132,6 @@ util
 xen
   ~ (/qemu)?(.*/xen.*)
 
-virtiofsd
-  ~ (/qemu)?(/tools/virtiofsd/.*)
-
 (headers)
   ~ (/qemu)?(/include/.*)
 
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh

[PATCH 3/3] virtiofsd: Swing deprecated message to removed-features

2023-01-18 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Move the deprecation message, since it's now gone.

Signed-off-by: Dr. David Alan Gilbert 
---
 docs/about/deprecated.rst   | 18 --
 docs/about/removed-features.rst | 13 +
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 9f1bbc495d..8543fa3285 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -325,24 +325,6 @@ versions, aliases will point to newer CPU model versions
 depending on the machine type, so management software must
 resolve CPU model aliases before starting a virtual machine.
 
-Tools
--
-
-virtiofsd
-'
-
-There is a new Rust implementation of ``virtiofsd`` at
-``https://gitlab.com/virtio-fs/virtiofsd``;
-since this is now marked stable, new development should be done on that
-rather than the existing C version in the QEMU tree.
-The C version will still accept fixes and patches that
-are already in development for the moment, but will eventually
-be deleted from this tree.
-New deployments should use the Rust version, and existing systems
-should consider moving to it.  The command line and feature set
-is very close and moving should be simple.
-
-
 QEMU guest agent
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 6c3aa5097f..9b0a212cfe 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -865,3 +865,16 @@ The VXHS code did not compile since v2.12.0. It was 
removed in 5.1.
 The corresponding upstream server project is no longer maintained.
 Users are recommended to switch to an alternative distributed block
 device driver such as RBD.
+
+Tools
+-
+
+virtiofsd (removed in 8.0)
+''
+
+There is a newer Rust implementation of ``virtiofsd`` at
+``https://gitlab.com/virtio-fs/virtiofsd``; this has been
+stable for some time and is now widely used.
+The command line and feature set is very close to the removed
+C implementation.
+
-- 
2.39.0

[PATCH] virtio-rng-pci: fix migration compat for vectors

2023-01-09 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Fixup the migration compatibility for existing machine types
so that they do not enable msi-x.

Symptom:

(qemu) qemu: get_pci_config_device: Bad config data: i=0x34 read: 84 device: 98 
cmask: ff wmask: 0 w1cmask:0
qemu: Failed to load PCIDevice:config
qemu: Failed to load virtio-rng:virtio
qemu: error while loading state for instance 0x0 of device 
':00:03.0/virtio-rng'
qemu: load of migration failed: Invalid argument

Note: This fix will break migration from 7.2->7.2-fixed with this patch

bz: https://bugzilla.redhat.com/show_bug.cgi?id=2155749
Fixes: 9ea02e8f1 ("virtio-rng-pci: Allow setting nvectors, so we can use MSI-X")

Signed-off-by: Dr. David Alan Gilbert 
---
 hw/core/machine.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f589b92909..45459d1cef 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -45,6 +45,7 @@ const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
 GlobalProperty hw_compat_7_1[] = {
 { "virtio-device", "queue_reset", "false" },
+{ "virtio-rng-pci", "vectors", "0" },
 };
 const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
 
-- 
2.39.0

[PULL 1/3] monitor: Support specified vCPU registers

2022-09-15 Thread Dr. David Alan Gilbert (git)

From: zhenwei pi 

Originally we have to get all the vCPU registers and parse the
specified one. To improve the performance of this usage, allow user
specified vCPU id to query registers.

Run a VM with 16 vCPU, use bcc tool to track the latency of
'hmp_info_registers':
'info registers -a' uses about 3ms;
'info registers 12' uses about 150us.

Cc: Darren Kenny 
Reviewed-by: Markus Armbruster 
Signed-off-by: zhenwei pi 
Reviewed-by: Darren Kenny 
Message-Id: <20220802073720.1236988-2-pizhen...@bytedance.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 hmp-commands-info.hx |  8 +---
 monitor/misc.c   | 10 --
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 188d9ece3b..e012035541 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -100,9 +100,11 @@ ERST
 
 {
 .name   = "registers",
-.args_type  = "cpustate_all:-a",
-.params = "[-a]",
-.help   = "show the cpu registers (-a: all - show register info 
for all cpus)",
+.args_type  = "cpustate_all:-a,vcpu:i?",
+.params = "[-a|vcpu]",
+.help   = "show the cpu registers (-a: show register info for all 
cpus;"
+  " vcpu: specific vCPU to query; show the current CPU's 
registers if"
+  " no argument is specified)",
 .cmd= hmp_info_registers,
 },
 
diff --git a/monitor/misc.c b/monitor/misc.c
index 3d2312ba8d..6436a8786b 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -307,6 +307,7 @@ int monitor_get_cpu_index(Monitor *mon)
 static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 {
 bool all_cpus = qdict_get_try_bool(qdict, "cpustate_all", false);
+int vcpu = qdict_get_try_int(qdict, "vcpu", -1);
 CPUState *cs;
 
 if (all_cpus) {
@@ -315,13 +316,18 @@ static void hmp_info_registers(Monitor *mon, const QDict 
*qdict)
 cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
 }
 } else {
-cs = mon_get_cpu(mon);
+cs = vcpu >= 0 ? qemu_get_cpu(vcpu) : mon_get_cpu(mon);
 
 if (!cs) {
-monitor_printf(mon, "No CPU available\n");
+if (vcpu >= 0) {
+monitor_printf(mon, "CPU#%d not available\n", vcpu);
+} else {
+monitor_printf(mon, "No CPU available\n");
+}
 return;
 }
 
+monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
 cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
 }
 }
-- 
2.37.3

[PULL 3/3] hmp: Fix ordering of text

2022-09-15 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Fix the ordering of the help text so it's always after the commands
being defined.  A few had got out of order.  Keep 'info' at the end.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Daniel P. Berrangé 
---
 hmp-commands.hx | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 182e639d14..8ab8000acd 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1742,23 +1742,6 @@ SRST
   *icount* for the reference may be observed with ``info replay`` command.
 ERST
 
-{
-.name   = "info",
-.args_type  = "item:s?",
-.params = "[subcommand]",
-.help   = "show various information about the system state",
-.cmd= hmp_info_help,
-.sub_table  = hmp_info_cmds,
-.flags  = "p",
-},
-
-SRST
-``calc_dirty_rate`` *second*
-  Start a round of dirty rate measurement with the period specified in 
*second*.
-  The result of the dirty rate measurement may be observed with ``info
-  dirty_rate`` command.
-ERST
-
 {
 .name   = "calc_dirty_rate",
 .args_type  = 
"dirty_ring:-r,dirty_bitmap:-b,second:l,sample_pages_per_GB:l?",
@@ -1770,10 +1753,10 @@ ERST
 },
 
 SRST
-``set_vcpu_dirty_limit``
-  Set dirty page rate limit on virtual CPU, the information about all the
-  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
-  command.
+``calc_dirty_rate`` *second*
+  Start a round of dirty rate measurement with the period specified in 
*second*.
+  The result of the dirty rate measurement may be observed with ``info
+  dirty_rate`` command.
 ERST
 
 {
@@ -1786,8 +1769,8 @@ ERST
 },
 
 SRST
-``cancel_vcpu_dirty_limit``
-  Cancel dirty page rate limit on virtual CPU, the information about all the
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
   virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
   command.
 ERST
@@ -1800,3 +1783,20 @@ ERST
   "\n\t\t\t\t\t limit on a specified virtual cpu",
 .cmd= hmp_cancel_vcpu_dirty_limit,
 },
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "info",
+.args_type  = "item:s?",
+.params = "[subcommand]",
+.help   = "show various information about the system state",
+.cmd= hmp_info_help,
+.sub_table  = hmp_info_cmds,
+.flags  = "p",
+},
-- 
2.37.3

[PULL 2/3] monitor/hmp: print trace as option in help for log command

2022-09-15 Thread Dr. David Alan Gilbert (git)

From: Dongli Zhang 

The below is printed when printing help information in qemu-system-x86_64
command line, and when CONFIG_TRACE_LOG is enabled:


$ qemu-system-x86_64 -d help
... ...
trace:PATTERN   enable trace events

Use "-d trace:help" to get a list of trace events.


However, the options of "trace:PATTERN" are only printed by
"qemu-system-x86_64 -d help", but missing in hmp "help log" command.

Fixes: c84ea00dc2 ("log: add "-d trace:PATTERN"")
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
Message-Id: <20220831213943.8155-1-dongli.zh...@oracle.com>
Reviewed-by: Markus Armbruster 
Signed-off-by: Dr. David Alan Gilbert 
---
 monitor/hmp.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index 15ca04735c..a3375d0341 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -285,10 +285,15 @@ void help_cmd(Monitor *mon, const char *name)
 if (!strcmp(name, "log")) {
 const QEMULogItem *item;
 monitor_printf(mon, "Log items (comma separated):\n");
-monitor_printf(mon, "%-10s %s\n", "none", "remove all logs");
+monitor_printf(mon, "%-15s %s\n", "none", "remove all logs");
 for (item = qemu_log_items; item->mask != 0; item++) {
-monitor_printf(mon, "%-10s %s\n", item->name, item->help);
+monitor_printf(mon, "%-15s %s\n", item->name, item->help);
 }
+#ifdef CONFIG_TRACE_LOG
+monitor_printf(mon, "trace:PATTERN   enable trace events\n");
+monitor_printf(mon, "\nUse \"log trace:help\" to get a list of "
+   "trace events.\n\n");
+#endif
 return;
 }
 
-- 
2.37.3

[PULL 0/3] hmp queue

2022-09-15 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit 79dfa177ae348bb5ab5f97c0915359b13d6186e2:

  Merge tag 'pull-qapi-2022-09-07' of git://repo.or.cz/qemu/armbru into staging 
(2022-09-07 13:13:30 -0400)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-hmp-20220915a

for you to fetch changes up to 22269b0436cc8e4aaac975b4c8cb01b343d09661:

  hmp: Fix ordering of text (2022-09-15 14:13:30 +0100)


HMP pull 2022-09-15

A set of 3 small additions/fixes.

Signed-off-by: Dr. David Alan Gilbert 


Dongli Zhang (1):
  monitor/hmp: print trace as option in help for log command

Dr. David Alan Gilbert (1):
  hmp: Fix ordering of text

Zhenwei Pi (1):
  monitor: Support specified vCPU registers

 hmp-commands-info.hx |  8 +---
 hmp-commands.hx  | 46 +++---
 monitor/hmp.c|  9 +++--
 monitor/misc.c   | 10 --
 4 files changed, 43 insertions(+), 30 deletions(-)

[PATCH] hmp: Fix ordering of text

2022-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Fix the ordering of the help text so it's always after the commands
being defined.  A few had got out of order.  Keep 'info' at the end.

Signed-off-by: Dr. David Alan Gilbert 
---
 hmp-commands.hx | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 182e639d14..8ab8000acd 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1742,23 +1742,6 @@ SRST
   *icount* for the reference may be observed with ``info replay`` command.
 ERST
 
-{
-.name   = "info",
-.args_type  = "item:s?",
-.params = "[subcommand]",
-.help   = "show various information about the system state",
-.cmd= hmp_info_help,
-.sub_table  = hmp_info_cmds,
-.flags  = "p",
-},
-
-SRST
-``calc_dirty_rate`` *second*
-  Start a round of dirty rate measurement with the period specified in 
*second*.
-  The result of the dirty rate measurement may be observed with ``info
-  dirty_rate`` command.
-ERST
-
 {
 .name   = "calc_dirty_rate",
 .args_type  = 
"dirty_ring:-r,dirty_bitmap:-b,second:l,sample_pages_per_GB:l?",
@@ -1770,10 +1753,10 @@ ERST
 },
 
 SRST
-``set_vcpu_dirty_limit``
-  Set dirty page rate limit on virtual CPU, the information about all the
-  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
-  command.
+``calc_dirty_rate`` *second*
+  Start a round of dirty rate measurement with the period specified in 
*second*.
+  The result of the dirty rate measurement may be observed with ``info
+  dirty_rate`` command.
 ERST
 
 {
@@ -1786,8 +1769,8 @@ ERST
 },
 
 SRST
-``cancel_vcpu_dirty_limit``
-  Cancel dirty page rate limit on virtual CPU, the information about all the
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
   virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
   command.
 ERST
@@ -1800,3 +1783,20 @@ ERST
   "\n\t\t\t\t\t limit on a specified virtual cpu",
 .cmd= hmp_cancel_vcpu_dirty_limit,
 },
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "info",
+.args_type  = "item:s?",
+.params = "[subcommand]",
+.help   = "show various information about the system state",
+.cmd= hmp_info_help,
+.sub_table  = hmp_info_cmds,
+.flags  = "p",
+},
-- 
2.37.3

[PATCH] keyval: Print types on merge inconsistency

2022-09-13 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

When 'keyval_do_merge' checks consistency of types, if they mismatch
print the types so we get a hint of what's going on.

e.g.
qemu-system-x86_64: Parameter 'memory' used inconsistently (qstring/qdict)

Signed-off-by: Dr. David Alan Gilbert 
---
 util/keyval.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/util/keyval.c b/util/keyval.c
index 66a5b4740f..9757adf31f 100644
--- a/util/keyval.c
+++ b/util/keyval.c
@@ -329,8 +329,10 @@ static void keyval_do_merge(QDict *dest, const QDict 
*merged, GString *str, Erro
 old_value = qdict_get(dest, ent->key);
 if (old_value) {
 if (qobject_type(old_value) != qobject_type(ent->value)) {
-error_setg(errp, "Parameter '%s%s' used inconsistently",
-   str->str, ent->key);
+error_setg(errp, "Parameter '%s%s' used inconsistently 
(%s/%s)",
+   str->str, ent->key,
+   QType_str(qobject_type(old_value)),
+   QType_str(qobject_type(ent->value)));
 return;
 } else if (qobject_type(ent->value) == QTYPE_QDICT) {
 /* Merge sub-dictionaries.  */
-- 
2.37.3

[PULL 5/5] virtiofsd: Disable killpriv_v2 by default

2022-08-02 Thread Dr. David Alan Gilbert (git)

From: Vivek Goyal 

We are having bunch of issues with killpriv_v2 enabled by default. First
of all it relies on clearing suid/sgid bits as needed by dropping
capability CAP_FSETID. This does not work for remote filesystems like
NFS (and possibly others).

Secondly, we are noticing other issues related to clearing of SGID
which leads to failures for xfstests generic/355 and generic/193.

Thirdly, there are other issues w.r.t caching of metadata (suid/sgid)
bits in fuse client with killpriv_v2 enabled. Guest can cache that
data for sometime even if cleared on server.

Second and Third issue are fixable. Just that it might take a little
while to get it fixed in kernel. First one will probably not see
any movement for a long time.

Given these issues, killpriv_v2 does not seem to be a good candidate
for enabling by default. We have already disabled it by default in
rust version of virtiofsd.

Hence this patch disabled killpriv_v2 by default. User can choose to
enable it by passing option "-o killpriv_v2".

Signed-off-by: Vivek Goyal 
Message-Id: 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/passthrough_ll.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 7a73dfcce9..371a7bead6 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -767,19 +767,10 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
 fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
 conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
 lo->killpriv_v2 = 1;
-} else if (lo->user_killpriv_v2 == -1 &&
-   conn->capable & FUSE_CAP_HANDLE_KILLPRIV_V2) {
-/*
- * User did not specify a value for killpriv_v2. By default enable it
- * if connection offers this capability
- */
-fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
-conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
-lo->killpriv_v2 = 1;
 } else {
 /*
- * Either user specified to disable killpriv_v2, or connection does
- * not offer this capability. Disable killpriv_v2 in both the cases
+ * Either user specified to disable killpriv_v2, or did not
+ * specify anything. Disable killpriv_v2 in both the cases.
  */
 fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling killpriv_v2\n");
 conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
-- 
2.37.1

[PULL 4/5] migration: Define BLK_MIG_BLOCK_SIZE as unsigned long long

2022-08-02 Thread Dr. David Alan Gilbert (git)

From: Peter Maydell 

When we use BLK_MIG_BLOCK_SIZE in expressions like
block_mig_state.submitted * BLK_MIG_BLOCK_SIZE, this multiplication
is done as 32 bits, because both operands are 32 bits.  Coverity
complains about possible overflows because we then accumulate that
into a 64 bit variable.

Define BLK_MIG_BLOCK_SIZE as unsigned long long using the ULL suffix.
The only two current uses of it with this problem are both in
block_save_pending(), so we could just cast to uint64_t there, but
using the ULL suffix is simpler and ensures that we don't
accidentally introduce new variants of the same issue in future.

Resolves: Coverity CID 1487136, 1487175
Signed-off-by: Peter Maydell 
Message-Id: <20220721115207.729615-3-peter.mayd...@linaro.org>
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block.c b/migration/block.c
index 9e5aae5898..3577c815a9 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -28,7 +28,7 @@
 #include "sysemu/block-backend.h"
 #include "trace.h"
 
-#define BLK_MIG_BLOCK_SIZE   (1 << 20)
+#define BLK_MIG_BLOCK_SIZE   (1ULL << 20)
 #define BDRV_SECTORS_PER_DIRTY_CHUNK (BLK_MIG_BLOCK_SIZE >> BDRV_SECTOR_BITS)
 
 #define BLK_MIG_FLAG_DEVICE_BLOCK   0x01
-- 
2.37.1

[PULL 1/5] migration: add remaining params->has_* = true in migration_instance_init()

2022-08-02 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Some of params->has_* = true are missing in migration_instance_init, this
causes migrate_params_check() to skip some tests, allowing some
unsupported scenarios.

Fix this by adding all missing params->has_* = true in
migration_instance_init().

Fixes: 69ef1f36b0 ("migration: define 'tls-creds' and 'tls-hostname' migration 
parameters")
Fixes: 1d58872a91 ("migration: do not wait for free thread")
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration 
parameter")
Signed-off-by: Leonardo Bras 
Message-Id: <20220726010235.342927-1-leob...@redhat.com>
Reviewed-by: Peter Xu 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index e03f698a3c..82fbe0cf55 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4451,6 +4451,7 @@ static void migration_instance_init(Object *obj)
 /* Set has_* up only for parameter checks */
 params->has_compress_level = true;
 params->has_compress_threads = true;
+params->has_compress_wait_thread = true;
 params->has_decompress_threads = true;
 params->has_throttle_trigger_threshold = true;
 params->has_cpu_throttle_initial = true;
@@ -4471,6 +4472,9 @@ static void migration_instance_init(Object *obj)
 params->has_announce_max = true;
 params->has_announce_rounds = true;
 params->has_announce_step = true;
+params->has_tls_creds = true;
+params->has_tls_hostname = true;
+params->has_tls_authz = true;
 
 qemu_sem_init(>postcopy_pause_sem, 0);
 qemu_sem_init(>postcopy_pause_rp_sem, 0);
-- 
2.37.1

[PULL 2/5] Revert "migration: Simplify unqueue_page()"

2022-08-02 Thread Dr. David Alan Gilbert (git)

From: Thomas Huth 

This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3.

The simplification of unqueue_page() introduced a bug that sometimes
breaks migration on s390x hosts.

The problem is not fully understood yet, but since we are already in
the freeze for QEMU 7.1 and we need something working there, let's
revert this patch for the upcoming release. The optimization can be
redone later again in a proper way if necessary.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099934
Signed-off-by: Thomas Huth 
Message-Id: <20220802061949.331576-1-th...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c| 37 ++---
 migration/trace-events |  3 ++-
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b94669ba5d..dc1de9ddbc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1612,7 +1612,6 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
*offset)
 {
 struct RAMSrcPageRequest *entry;
 RAMBlock *block = NULL;
-size_t page_size;
 
 if (!postcopy_has_request(rs)) {
 return NULL;
@@ -1629,13 +1628,10 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
*offset)
 entry = QSIMPLEQ_FIRST(>src_page_requests);
 block = entry->rb;
 *offset = entry->offset;
-page_size = qemu_ram_pagesize(block);
-/* Each page request should only be multiple page size of the ramblock */
-assert((entry->len % page_size) == 0);
 
-if (entry->len > page_size) {
-entry->len -= page_size;
-entry->offset += page_size;
+if (entry->len > TARGET_PAGE_SIZE) {
+entry->len -= TARGET_PAGE_SIZE;
+entry->offset += TARGET_PAGE_SIZE;
 } else {
 memory_region_unref(block->mr);
 QSIMPLEQ_REMOVE_HEAD(>src_page_requests, next_req);
@@ -1643,9 +1639,6 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
*offset)
 migration_consume_urgent_request();
 }
 
-trace_unqueue_page(block->idstr, *offset,
-   test_bit((*offset >> TARGET_PAGE_BITS), block->bmap));
-
 return block;
 }
 
@@ -2069,8 +2062,30 @@ static bool get_queued_page(RAMState *rs, 
PageSearchStatus *pss)
 {
 RAMBlock  *block;
 ram_addr_t offset;
+bool dirty;
+
+do {
+block = unqueue_page(rs, );
+/*
+ * We're sending this page, and since it's postcopy nothing else
+ * will dirty it, and we must make sure it doesn't get sent again
+ * even if this queue request was received after the background
+ * search already sent it.
+ */
+if (block) {
+unsigned long page;
+
+page = offset >> TARGET_PAGE_BITS;
+dirty = test_bit(page, block->bmap);
+if (!dirty) {
+trace_get_queued_page_not_dirty(block->idstr, (uint64_t)offset,
+page);
+} else {
+trace_get_queued_page(block->idstr, (uint64_t)offset, page);
+}
+}
 
-block = unqueue_page(rs, );
+} while (block && !dirty);
 
 if (block) {
 /* See comment above postcopy_preempted_contains() */
diff --git a/migration/trace-events b/migration/trace-events
index a34afe7b85..57003edcbd 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -85,6 +85,8 @@ put_qlist_end(const char *field_name, const char *vmsd_name) 
"%s(%s)"
 qemu_file_fclose(void) ""
 
 # ram.c
+get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long 
page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
+get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, 
unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, 
unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
@@ -110,7 +112,6 @@ ram_save_iterate_big_wait(uint64_t milliconds, int 
iterations) "big wait: %" PRI
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" 
PRIu64
 ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void 
*addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
 ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void 
*addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
-unqueue_page(char *block, uint64_t offset, bool dirty) "ramblock '%s' offset 
0x%"PRIx64" dirty %d"
 postcopy_preempt_triggered(char *str, unsigned long page) "during sending 
ramblock %s offset 0x%lx"
 postcopy_preempt_restored(char *str, unsigned long page) "ramblock %s offset 
0x%lx"
 postcopy_preempt_hit(char *str, uint64_t offset) "ramblock %s offset 0x%"PRIx64
-- 
2.37.1

[PULL 3/5] migration: Assert that migrate_multifd_compression() returns an in-range value

2022-08-02 Thread Dr. David Alan Gilbert (git)

From: Peter Maydell 

Coverity complains that when we use the return value from
migrate_multifd_compression() as an array index:
  multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];

that this might overrun the array (which is declared to have size
MULTIFD_COMPRESSION__MAX).  This is because the function return type
is MultiFDCompression, which is an autogenerated enum.  The code
generator includes the "one greater than the maximum possible value"
MULTIFD_COMPRESSION__MAX in the enum, even though this is not
actually a valid value for the enum, and this makes Coverity think
that migrate_multifd_compression() could return that __MAX value and
index off the end of the array.

Suppress the Coverity error by asserting that the value we're going
to return is within range.

Resolves: Coverity CID 1487239, 1487254
Signed-off-by: Peter Maydell 
Message-Id: <20220721115207.729615-2-peter.mayd...@linaro.org>
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index 82fbe0cf55..bb8bbddfe4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2617,6 +2617,7 @@ MultiFDCompression migrate_multifd_compression(void)
 
 s = migrate_get_current();
 
+assert(s->parameters.multifd_compression < MULTIFD_COMPRESSION__MAX);
 return s->parameters.multifd_compression;
 }
 
-- 
2.37.1

[PULL 0/5] migration queue

2022-08-02 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit 0399521e53336bd2cdc15482bca0ffd3493fdff6:

  Merge tag 'for-upstream' of git://repo.or.cz/qemu/kevin into staging 
(2022-08-02 06:52:05 -0700)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220802c

for you to fetch changes up to a21ba54dd5ca38cd05da9035fc65374d7af54f13:

  virtiofsd: Disable killpriv_v2 by default (2022-08-02 16:46:52 +0100)


Migration fixes pull 2022-08-02

Small migration (and virtiofsd) fixes.

Signed-off-by: Dr. David Alan Gilbert 


Leonardo Bras (1):
  migration: add remaining params->has_* = true in migration_instance_init()

Peter Maydell (2):
  migration: Assert that migrate_multifd_compression() returns an in-range 
value
  migration: Define BLK_MIG_BLOCK_SIZE as unsigned long long

Thomas Huth (1):
  Revert "migration: Simplify unqueue_page()"

Vivek Goyal (1):
  virtiofsd: Disable killpriv_v2 by default

 migration/block.c|  2 +-
 migration/migration.c|  5 +
 migration/ram.c  | 37 ++---
 migration/trace-events   |  3 ++-
 tools/virtiofsd/passthrough_ll.c | 13 ++---
 5 files changed, 36 insertions(+), 24 deletions(-)

[PULL 29/30] migration: Avoid false-positive on non-supported scenarios for zero-copy-send

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.

But currently there are some ways of setting this not-supported scenarios
without printing the error message:

!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].

The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.

In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().

2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.

In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().

3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.

Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter 
to migration capability")
Signed-off-by: Leonardo Bras 
Message-Id: <20220719122345.253713-1-leob...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 15ae48b209..e03f698a3c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1306,7 +1306,9 @@ static bool migrate_caps_check(bool *cap_list,
 #ifdef CONFIG_LINUX
 if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND] &&
 (!cap_list[MIGRATION_CAPABILITY_MULTIFD] ||
- migrate_use_compression() ||
+ cap_list[MIGRATION_CAPABILITY_COMPRESS] ||
+ cap_list[MIGRATION_CAPABILITY_XBZRLE] ||
+ migrate_multifd_compression() ||
  migrate_use_tls())) {
 error_setg(errp,
"Zero copy only available for non-compressed non-TLS 
multifd migration");
@@ -1550,6 +1552,17 @@ static bool migrate_params_check(MigrationParameters 
*params, Error **errp)
 error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
");
 return false;
 }
+
+#ifdef CONFIG_LINUX
+if (migrate_use_zero_copy_send() &&
+((params->has_multifd_compression && params->multifd_compression) ||
+ (params->has_tls_creds && params->tls_creds && *params->tls_creds))) {
+error_setg(errp,
+   "Zero copy only available for non-compressed non-TLS 
multifd migration");
+return false;
+}
+#endif
+
 return true;
 }
 
-- 
2.36.1

[PULL 27/30] migration/multifd: Report to user when zerocopy not working

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Peter Xu 
Message-Id: <2022071122.18951-4-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/multifd.c | 2 ++
 migration/ram.c | 5 +
 migration/ram.h | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1e49594b02..586ddc9d65 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
 if (ret < 0) {
 error_report_err(err);
 return -1;
+} else if (ret == 1) {
+dirty_sync_missed_zero_copy();
 }
 }
 }
diff --git a/migration/ram.c b/migration/ram.c
index 4fbad74c6c..b94669ba5d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -434,6 +434,11 @@ static void ram_transferred_add(uint64_t bytes)
 ram_counters.transferred += bytes;
 }
 
+void dirty_sync_missed_zero_copy(void)
+{
+ram_counters.dirty_sync_missed_zero_copy++;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
 /* Current block being searched */
diff --git a/migration/ram.h b/migration/ram.h
index 5d90945a6e..c7af65ac74 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -89,4 +89,6 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+void dirty_sync_missed_zero_copy(void);
+
 #endif
-- 
2.36.1

[PULL 28/30] multifd: Document the locking of MultiFD{Send/Recv}Params

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Juan Quintela 

Reorder the structures so we can know if the fields are:
- Read only
- Their own locking (i.e. sems)
- Protected by 'mutex'
- Only for the multifd channel

Signed-off-by: Juan Quintela 
Message-Id: <20220531104318.7494-2-quint...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Typo fixes from Chen Zhang
---
 migration/multifd.h | 66 -
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 4d8d89e5e5..519f498643 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -65,7 +65,9 @@ typedef struct {
 } MultiFDPages_t;
 
 typedef struct {
-/* this fields are not changed once the thread is created */
+/* Fields are only written at creating/deletion time */
+/* No lock required for them, they are read only */
+
 /* channel number */
 uint8_t id;
 /* channel thread name */
@@ -74,39 +76,47 @@ typedef struct {
 QemuThread thread;
 /* communication channel */
 QIOChannel *c;
+/* is the yank function registered */
+bool registered_yank;
+/* packet allocated len */
+uint32_t packet_len;
+/* multifd flags for sending ram */
+int write_flags;
+
 /* sem where to wait for more work */
 QemuSemaphore sem;
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+
 /* this mutex protects the following parameters */
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* should this thread finish */
 bool quit;
-/* is the yank function registered */
-bool registered_yank;
+/* multifd flags for each packet */
+uint32_t flags;
+/* global number of generated multifd packets */
+uint64_t packet_num;
 /* thread has work to do */
 int pending_job;
-/* array of pages to sent */
+/* array of pages to sent.
+ * The owner of 'pages' depends of 'pending_job' value:
+ * pending_job == 0 -> migration_thread can use it.
+ * pending_job != 0 -> multifd_channel can use it.
+ */
 MultiFDPages_t *pages;
-/* packet allocated len */
-uint32_t packet_len;
+
+/* thread local variables. No locking required */
+
 /* pointer to the packet */
 MultiFDPacket_t *packet;
-/* multifd flags for sending ram */
-int write_flags;
-/* multifd flags for each packet */
-uint32_t flags;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
-/* global number of generated multifd packets */
-uint64_t packet_num;
-/* thread local variables */
 /* packets sent through this channel */
 uint64_t num_packets;
 /* non zero pages sent through this channel */
 uint64_t total_normal_pages;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
 /* buffers to send */
 struct iovec *iov;
 /* number of iovs used */
@@ -120,7 +130,9 @@ typedef struct {
 }  MultiFDSendParams;
 
 typedef struct {
-/* this fields are not changed once the thread is created */
+/* Fields are only written at creating/deletion time */
+/* No lock required for them, they are read only */
+
 /* channel number */
 uint8_t id;
 /* channel thread name */
@@ -129,31 +141,35 @@ typedef struct {
 QemuThread thread;
 /* communication channel */
 QIOChannel *c;
+/* packet allocated len */
+uint32_t packet_len;
+
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+
 /* this mutex protects the following parameters */
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* should this thread finish */
 bool quit;
-/* ramblock host address */
-uint8_t *host;
-/* packet allocated len */
-uint32_t packet_len;
-/* pointer to the packet */
-MultiFDPacket_t *packet;
 /* multifd flags for each packet */
 uint32_t flags;
 /* global number of generated multifd packets */
 uint64_t packet_num;
-/* thread local variables */
+
+/* thread local variables. No locking required */
+
+/* pointer to the packet */
+MultiFDPacket_t *packet;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
 /* packets sent through this channel */
 uint64_t num_packets;
+/* ramblock host address */
+uint8_t *host;
 /* non zero pages recv through this channel */
 uint64_t total_normal_pages;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
 /* buffers to recv */
 struct iovec *iov;
 /* Pages that are not zero */
-- 
2.36.1

[PULL 30/30] Revert "gitlab: disable accelerated zlib for s390x"

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

This reverts commit 309df6acb29346f89e1ee542b1986f60cab12b87.
With Ilya's 'multifd: Copy pages before compressing them with zlib'
in the latest migration series, this shouldn't be a problem any more.

Suggested-by: Peter Maydell 
Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Thomas Huth 
---
 .gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml | 12 
 .travis.yml|  6 ++
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml 
b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
index 9f1fe9e7dc..03e74c97db 100644
--- a/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
+++ b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
@@ -8,8 +8,6 @@ ubuntu-20.04-s390x-all-linux-static:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
  - if: "$S390X_RUNNER_AVAILABLE"
@@ -29,8 +27,6 @@ ubuntu-20.04-s390x-all:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  timeout: 75m
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -48,8 +44,6 @@ ubuntu-20.04-s390x-alldbg:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
@@ -71,8 +65,6 @@ ubuntu-20.04-s390x-clang:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
@@ -93,8 +85,6 @@ ubuntu-20.04-s390x-tci:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
@@ -114,8 +104,6 @@ ubuntu-20.04-s390x-notcg:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
diff --git a/.travis.yml b/.travis.yml
index 4fdc9a6785..fb3baabca9 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -218,7 +218,6 @@ jobs:
 - TEST_CMD="make check check-tcg V=1"
 - CONFIG="--disable-containers 
--target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
 - UNRELIABLE=true
-- DFLTCC=0
   script:
 - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
 - |
@@ -258,7 +257,7 @@ jobs:
   env:
 - CONFIG="--disable-containers --audio-drv-list=sdl --disable-user
   --target-list-exclude=${MAIN_SOFTMMU_TARGETS}"
-- DFLTCC=0
+
 - name: "[s390x] GCC (user)"
   arch: s390x
   dist: focal
@@ -270,7 +269,7 @@ jobs:
   - ninja-build
   env:
 - CONFIG="--disable-containers --disable-system"
-- DFLTCC=0
+
 - name: "[s390x] Clang (disable-tcg)"
   arch: s390x
   dist: focal
@@ -304,4 +303,3 @@ jobs:
 - CONFIG="--disable-containers --disable-tcg --enable-kvm
   --disable-tools --host-cc=clang --cxx=clang++"
 - UNRELIABLE=true
-- DFLTCC=0
-- 
2.36.1

[PULL 23/30] tests: Add postcopy preempt tests

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Four tests are added for preempt mode:

  - Postcopy plain
  - Postcopy recovery
  - Postcopy tls
  - Postcopy tls+recovery

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185530.27801-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 59 ++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5600b6d46a..71595a74fd 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -576,6 +576,7 @@ typedef struct {
 
 /* Postcopy specific fields */
 void *postcopy_data;
+bool postcopy_preempt;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1081,6 +1082,11 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
+if (args->postcopy_preempt) {
+migrate_set_capability(from, "postcopy-preempt", true);
+migrate_set_capability(to, "postcopy-preempt", true);
+}
+
 migrate_ensure_non_converge(from);
 
 /* Wait for the first serial output from the source */
@@ -1134,6 +1140,15 @@ static void test_postcopy(void)
 test_postcopy_common();
 }
 
+static void test_postcopy_preempt(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_common();
+}
+
 #ifdef CONFIG_GNUTLS
 static void test_postcopy_tls_psk(void)
 {
@@ -1144,6 +1159,17 @@ static void test_postcopy_tls_psk(void)
 
 test_postcopy_common();
 }
+
+static void test_postcopy_preempt_tls_psk(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common();
+}
 #endif
 
 static void test_postcopy_recovery_common(MigrateCommon *args)
@@ -1227,6 +1253,29 @@ static void test_postcopy_recovery_tls_psk(void)
 }
 #endif
 
+static void test_postcopy_preempt_recovery(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_recovery_common();
+}
+
+#ifdef CONFIG_GNUTLS
+/* This contains preempt+recovery+tls test altogether */
+static void test_postcopy_preempt_all(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common();
+}
+#endif
+
 static void test_baddest(void)
 {
 MigrateStart args = {
@@ -2427,10 +2476,12 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery/plain",
test_postcopy_recovery);
-
-qtest_add_func("/migration/postcopy/plain", test_postcopy);
+qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery/plain",
+test_postcopy_preempt_recovery);
 
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
@@ -2446,6 +2497,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 qtest_add_func("/migration/postcopy/recovery/tls/psk",
test_postcopy_recovery_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/tls/psk",
+   test_postcopy_preempt_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk",
+   test_postcopy_preempt_all);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 25/30] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().

Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & 
io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Reviewed-by: Peter Xu 
Message-Id: <2022071122.18951-2-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 io/channel-socket.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 4466bb1cd4..74a936cc1f 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
 struct cmsghdr *cm;
 char control[CMSG_SPACE(sizeof(*serr))];
 int received;
-int ret = 1;
+int ret;
+
+if (sioc->zero_copy_queued == sioc->zero_copy_sent) {
+return 0;
+}
 
 msg.msg_control = control;
 msg.msg_controllen = sizeof(control);
 memset(control, 0, sizeof(control));
 
+ret = 1;
+
 while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
 received = recvmsg(sioc->fd, , MSG_ERRQUEUE);
 if (received < 0) {
-- 
2.36.1

[PULL 18/30] migration: Enable TLS for preempt channel

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch is based on the async preempt channel creation.  It continues
wiring up the new channel with TLS handshake to destionation when enabled.

Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185518.27529-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/postcopy-ram.c | 57 ++--
 migration/trace-events   |  1 +
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 70b21e9d51..b9a37ef255 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -36,6 +36,7 @@
 #include "socket.h"
 #include "qemu-file.h"
 #include "yank_functions.h"
+#include "tls.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -1552,15 +1553,15 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
+/*
+ * Setup the postcopy preempt channel with the IOC.  If ERROR is specified,
+ * setup the error instead.  This helper will free the ERROR if specified.
+ */
 static void
-postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+postcopy_preempt_send_channel_done(MigrationState *s,
+   QIOChannel *ioc, Error *local_err)
 {
-MigrationState *s = opaque;
-QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-Error *local_err = NULL;
-
-if (qio_task_propagate_error(task, _err)) {
-/* Something wrong happened.. */
+if (local_err) {
 migrate_set_error(s, local_err);
 error_free(local_err);
 } else {
@@ -1574,7 +1575,47 @@ postcopy_preempt_send_channel_new(QIOTask *task, 
gpointer opaque)
  * postcopy_qemufile_src to know whether it failed or not.
  */
 qemu_sem_post(>postcopy_qemufile_src_sem);
-object_unref(OBJECT(ioc));
+}
+
+static void
+postcopy_preempt_tls_handshake(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+Error *local_err = NULL;
+
+qio_task_propagate_error(task, _err);
+postcopy_preempt_send_channel_done(s, ioc, local_err);
+}
+
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+QIOChannelTLS *tioc;
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, _err)) {
+goto out;
+}
+
+if (migrate_channel_requires_tls_upgrade(ioc)) {
+tioc = migration_tls_client_create(s, ioc, s->hostname, _err);
+if (!tioc) {
+goto out;
+}
+trace_postcopy_preempt_tls_handshake();
+qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-preempt");
+qio_channel_tls_handshake(tioc, postcopy_preempt_tls_handshake,
+  s, NULL, NULL);
+/* Setup the channel until TLS handshake finished */
+return;
+}
+
+out:
+/* This handles both good and error cases */
+postcopy_preempt_send_channel_done(s, ioc, local_err);
 }
 
 /* Returns 0 if channel established, -1 for error. */
diff --git a/migration/trace-events b/migration/trace-events
index 0e385c3a07..a34afe7b85 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -287,6 +287,7 @@ postcopy_request_shared_page(const char *sharer, const char 
*rb, uint64_t rb_off
 postcopy_request_shared_page_present(const char *sharer, const char *rb, 
uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in 
%s"
 postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
+postcopy_preempt_tls_handshake(void) ""
 postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
-- 
2.36.1

[PULL 17/30] migration: Export tls-[creds|hostname|authz] params to cmdline too

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.

The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.

The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself.  To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:

object_property_del_all(obj);
object_deinit(obj, ti);

So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.

This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185515.27475-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index cc41787079..7c7e529ca7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4366,6 +4366,9 @@ static Property migration_properties[] = {
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
 DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
   postcopy_preempt_break_huge, true),
+DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
+DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname),
+DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -4403,12 +4406,9 @@ static void migration_class_init(ObjectClass *klass, 
void *data)
 static void migration_instance_finalize(Object *obj)
 {
 MigrationState *ms = MIGRATION_OBJ(obj);
-MigrationParameters *params = >parameters;
 
 qemu_mutex_destroy(>error_mutex);
 qemu_mutex_destroy(>qemu_file_lock);
-g_free(params->tls_hostname);
-g_free(params->tls_creds);
 qemu_sem_destroy(>wait_unplug_sem);
 qemu_sem_destroy(>rate_limit_sem);
 qemu_sem_destroy(>pause_sem);
-- 
2.36.1

[PULL 21/30] tests: Add postcopy tls migration test

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

We just added TLS tests for precopy but not postcopy.  Add the
corresponding test for vanilla postcopy.

Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.

Signed-off-by: Peter Xu 
Message-Id: <20220707185525.27692-1-pet...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 61 ++--
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f3931e0a92..b2020ef6c5 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -573,6 +573,9 @@ typedef struct {
 
 /* Optional: set number of migration passes to wait for */
 unsigned int iterations;
+
+/* Postcopy specific fields */
+void *postcopy_data;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1061,15 +1064,19 @@ test_migrate_tls_x509_finish(QTestState *from,
 
 static int migrate_postcopy_prepare(QTestState **from_ptr,
 QTestState **to_ptr,
-MigrateStart *args)
+MigrateCommon *args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 QTestState *from, *to;
 
-if (test_migrate_start(, , uri, args)) {
+if (test_migrate_start(, , uri, >start)) {
 return -1;
 }
 
+if (args->start_hook) {
+args->postcopy_data = args->start_hook(from, to);
+}
+
 migrate_set_capability(from, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
@@ -1089,7 +1096,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 return 0;
 }
 
-static void migrate_postcopy_complete(QTestState *from, QTestState *to)
+static void migrate_postcopy_complete(QTestState *from, QTestState *to,
+  MigrateCommon *args)
 {
 wait_for_migration_complete(from);
 
@@ -1100,25 +1108,50 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to)
 read_blocktime(to);
 }
 
+if (args->finish_hook) {
+args->finish_hook(from, to, args->postcopy_data);
+args->postcopy_data = NULL;
+}
+
 test_migrate_end(from, to, true);
 }
 
-static void test_postcopy(void)
+static void test_postcopy_common(MigrateCommon *args)
 {
-MigrateStart args = {};
 QTestState *from, *to;
 
-if (migrate_postcopy_prepare(, , )) {
+if (migrate_postcopy_prepare(, , args)) {
 return;
 }
 migrate_postcopy_start(from, to);
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, args);
 }
 
+static void test_postcopy(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_common();
+}
+
+#ifdef CONFIG_GNUTLS
+static void test_postcopy_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common();
+}
+#endif
+
 static void test_postcopy_recovery(void)
 {
-MigrateStart args = {
-.hide_stderr = true,
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+},
 };
 QTestState *from, *to;
 g_autofree char *uri = NULL;
@@ -1174,7 +1207,7 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, );
 }
 
 static void test_baddest(void)
@@ -2378,12 +2411,20 @@ int main(int argc, char **argv)
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
+
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
+/*
+ * NOTE: psk test is enough for postcopy, as other types of TLS
+ * channels are tested under precopy.  Here what we want to test is the
+ * general postcopy path that has TLS channel enabled.
+ */
+qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 20/30] tests: Move MigrateCommon upper

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

So that it can be used in postcopy tests too soon.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185522.27638-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-test.c | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index db4dcc5b31..f3931e0a92 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -503,6 +503,78 @@ typedef struct {
 const char *opts_target;
 } MigrateStart;
 
+/*
+ * A hook that runs after the src and dst QEMUs have been
+ * created, but before the migration is started. This can
+ * be used to set migration parameters and capabilities.
+ *
+ * Returns: NULL, or a pointer to opaque state to be
+ *  later passed to the TestMigrateFinishHook
+ */
+typedef void * (*TestMigrateStartHook)(QTestState *from,
+   QTestState *to);
+
+/*
+ * A hook that runs after the migration has finished,
+ * regardless of whether it succeeded or failed, but
+ * before QEMU has terminated (unless it self-terminated
+ * due to migration error)
+ *
+ * @opaque is a pointer to state previously returned
+ * by the TestMigrateStartHook if any, or NULL.
+ */
+typedef void (*TestMigrateFinishHook)(QTestState *from,
+  QTestState *to,
+  void *opaque);
+
+typedef struct {
+/* Optional: fine tune start parameters */
+MigrateStart start;
+
+/* Required: the URI for the dst QEMU to listen on */
+const char *listen_uri;
+
+/*
+ * Optional: the URI for the src QEMU to connect to
+ * If NULL, then it will query the dst QEMU for its actual
+ * listening address and use that as the connect address.
+ * This allows for dynamically picking a free TCP port.
+ */
+const char *connect_uri;
+
+/* Optional: callback to run at start to set migration parameters */
+TestMigrateStartHook start_hook;
+/* Optional: callback to run at finish to cleanup */
+TestMigrateFinishHook finish_hook;
+
+/*
+ * Optional: normally we expect the migration process to complete.
+ *
+ * There can be a variety of reasons and stages in which failure
+ * can happen during tests.
+ *
+ * If a failure is expected to happen at time of establishing
+ * the connection, then MIG_TEST_FAIL will indicate that the dst
+ * QEMU is expected to stay running and accept future migration
+ * connections.
+ *
+ * If a failure is expected to happen while processing the
+ * migration stream, then MIG_TEST_FAIL_DEST_QUIT_ERR will indicate
+ * that the dst QEMU is expected to quit with non-zero exit status
+ */
+enum {
+/* This test should succeed, the default */
+MIG_TEST_SUCCEED = 0,
+/* This test should fail, dest qemu should keep alive */
+MIG_TEST_FAIL,
+/* This test should fail, dest qemu should fail with abnormal status */
+MIG_TEST_FAIL_DEST_QUIT_ERR,
+} result;
+
+/* Optional: set number of migration passes to wait for */
+unsigned int iterations;
+} MigrateCommon;
+
 static int test_migrate_start(QTestState **from, QTestState **to,
   const char *uri, MigrateStart *args)
 {
@@ -1120,78 +1192,6 @@ static void test_baddest(void)
 test_migrate_end(from, to, false);
 }
 
-/*
- * A hook that runs after the src and dst QEMUs have been
- * created, but before the migration is started. This can
- * be used to set migration parameters and capabilities.
- *
- * Returns: NULL, or a pointer to opaque state to be
- *  later passed to the TestMigrateFinishHook
- */
-typedef void * (*TestMigrateStartHook)(QTestState *from,
-   QTestState *to);
-
-/*
- * A hook that runs after the migration has finished,
- * regardless of whether it succeeded or failed, but
- * before QEMU has terminated (unless it self-terminated
- * due to migration error)
- *
- * @opaque is a pointer to state previously returned
- * by the TestMigrateStartHook if any, or NULL.
- */
-typedef void (*TestMigrateFinishHook)(QTestState *from,
-  QTestState *to,
-  void *opaque);
-
-typedef struct {
-/* Optional: fine tune start parameters */
-MigrateStart start;
-
-/* Required: the URI for the dst QEMU to listen on */
-const char *listen_uri;
-
-/*
- * Optional: the URI for the src QEMU to connect to
- * If NULL, then it will query the dst QEMU for its actual
- * listening address and use that as the connect address.
- * This allows for dynamically picking a free TCP port.
- */
-const char *connect_uri;
-
-/* Optional: callback to run at start to set migration parameters */
-TestMigrateStartHook

[PULL 24/30] migration: remove unreachable code after reading data

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

The code calls qio_channel_read() in a loop when it reports
QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.

As such the later block of code will always hit the 'errno != EAGAIN'
condition, making the final 'else' unreachable.

Fixes: Coverity CID 1490203
Signed-off-by: Daniel P. Berrangé 
Message-Id: <20220627135318.156121-1-berra...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2f266b25cd..4f400c2e52 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -411,10 +411,8 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 f->total_transferred += len;
 } else if (len == 0) {
 qemu_file_set_error_obj(f, -EIO, local_error);
-} else if (len != -EAGAIN) {
-qemu_file_set_error_obj(f, len, local_error);
 } else {
-error_free(local_error);
+qemu_file_set_error_obj(f, len, local_error);
 }
 
 return len;
-- 
2.36.1

[PULL 16/30] migration: Add helpers to detect TLS capability

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls().  No
functional change intended.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185513.27421-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/channel.c   | 9 ++---
 migration/migration.c | 1 +
 migration/multifd.c   | 4 +---
 migration/tls.c   | 9 +
 migration/tls.h   | 4 
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 90087d8986..1b0815039f 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,9 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));
 
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_process_incoming(s, ioc, _err);
 } else {
 migration_ioc_register_yank(ioc);
@@ -70,10 +68,7 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_connect(s, ioc, hostname, );
 
 if (!error) {
diff --git a/migration/migration.c b/migration/migration.c
index 864164ad96..cc41787079 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -48,6 +48,7 @@
 #include "trace.h"
 #include "exec/target_page.h"
 #include "io/channel-buffer.h"
+#include "io/channel-tls.h"
 #include "migration/colo.h"
 #include "hw/boards.h"
 #include "hw/qdev-properties.h"
diff --git a/migration/multifd.c b/migration/multifd.c
index 684c014c86..1e49594b02 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -831,9 +831,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 migrate_get_current()->hostname, error);
 
 if (!error) {
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 multifd_tls_channel_connect(p, ioc, );
 if (!error) {
 /*
diff --git a/migration/tls.c b/migration/tls.c
index 32c384a8b6..73e8c9d3c2 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -166,3 +166,12 @@ void migration_tls_channel_connect(MigrationState *s,
   NULL,
   NULL);
 }
+
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
+{
+if (!migrate_use_tls()) {
+return false;
+}
+
+return !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+}
diff --git a/migration/tls.h b/migration/tls.h
index de4fe2cafd..98e23c9b0e 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -37,4 +37,8 @@ void migration_tls_channel_connect(MigrationState *s,
QIOChannel *ioc,
const char *hostname,
Error **errp);
+
+/* Whether the QIO channel requires further TLS handshake? */
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
+
 #endif
-- 
2.36.1

[PULL 22/30] tests: Add postcopy tls recovery migration test

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

It's easy to build this upon the postcopy tls test.  Rename the old
postcopy recovery test to postcopy/recovery/plain.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185527.27747-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 39 +++-
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b2020ef6c5..5600b6d46a 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1146,17 +1146,15 @@ static void test_postcopy_tls_psk(void)
 }
 #endif
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_recovery_common(MigrateCommon *args)
 {
-MigrateCommon args = {
-.start = {
-.hide_stderr = true,
-},
-};
 QTestState *from, *to;
 g_autofree char *uri = NULL;
 
-if (migrate_postcopy_prepare(, , )) {
+/* Always hide errors for postcopy recover tests since they're expected */
+args->start.hide_stderr = true;
+
+if (migrate_postcopy_prepare(, , args)) {
 return;
 }
 
@@ -1207,9 +1205,28 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to, );
+migrate_postcopy_complete(from, to, args);
+}
+
+static void test_postcopy_recovery(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_recovery_common();
 }
 
+#ifdef CONFIG_GNUTLS
+static void test_postcopy_recovery_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common();
+}
+#endif
+
 static void test_baddest(void)
 {
 MigrateStart args = {
@@ -2410,7 +2427,9 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
-qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/recovery/plain",
+   test_postcopy_recovery);
+
 qtest_add_func("/migration/postcopy/plain", test_postcopy);
 
 qtest_add_func("/migration/bad_dest", test_baddest);
@@ -2425,6 +2444,8 @@ int main(int argc, char **argv)
  * general postcopy path that has TLS channel enabled.
  */
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
+qtest_add_func("/migration/postcopy/recovery/tls/psk",
+   test_postcopy_recovery_tls_psk);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 09/30] multifd: Copy pages before compressing them with zlib

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Ilya Leoshkevich 

zlib_send_prepare() compresses pages of a running VM. zlib does not
make any thread-safety guarantees with respect to changing deflate()
input concurrently with deflate() [1].

One can observe problems due to this with the IBM zEnterprise Data
Compression accelerator capable zlib [2]. When the hardware
acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
intermittently [3] due to sliding window corruption. The accelerator's
architecture explicitly discourages concurrent accesses [4]:

Page 26-57, "Other Conditions":

As observed by this CPU, other CPUs, and channel
programs, references to the parameter block, first,
second, and third operands may be multiple-access
references, accesses to these storage locations are
not necessarily block-concurrent, and the sequence
of these accesses or references is undefined.

Mark Adler pointed out that vanilla zlib performs double fetches under
certain circumstances as well [5], therefore we need to copy data
before passing it to deflate().

[1] https://zlib.net/manual.html
[2] https://github.com/madler/zlib/pull/410
[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
[4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
[5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20220705203559.2960949-1-...@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/multifd-zlib.c | 38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 3a7ae44485..18213a9513 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -27,6 +27,8 @@ struct zlib_data {
 uint8_t *zbuff;
 /* size of compressed buffer */
 uint32_t zbuff_len;
+/* uncompressed buffer of size qemu_target_page_size() */
+uint8_t *buf;
 };
 
 /* Multifd zlib compression */
@@ -45,26 +47,38 @@ static int zlib_send_setup(MultiFDSendParams *p, Error 
**errp)
 {
 struct zlib_data *z = g_new0(struct zlib_data, 1);
 z_stream *zs = >zs;
+const char *err_msg;
 
 zs->zalloc = Z_NULL;
 zs->zfree = Z_NULL;
 zs->opaque = Z_NULL;
 if (deflateInit(zs, migrate_multifd_zlib_level()) != Z_OK) {
-g_free(z);
-error_setg(errp, "multifd %u: deflate init failed", p->id);
-return -1;
+err_msg = "deflate init failed";
+goto err_free_z;
 }
 /* This is the maxium size of the compressed buffer */
 z->zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
-deflateEnd(>zs);
-g_free(z);
-error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
-return -1;
+err_msg = "out of memory for zbuff";
+goto err_deflate_end;
+}
+z->buf = g_try_malloc(qemu_target_page_size());
+if (!z->buf) {
+err_msg = "out of memory for buf";
+goto err_free_zbuff;
 }
 p->data = z;
 return 0;
+
+err_free_zbuff:
+g_free(z->zbuff);
+err_deflate_end:
+deflateEnd(>zs);
+err_free_z:
+g_free(z);
+error_setg(errp, "multifd %u: %s", p->id, err_msg);
+return -1;
 }
 
 /**
@@ -82,6 +96,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 deflateEnd(>zs);
 g_free(z->zbuff);
 z->zbuff = NULL;
+g_free(z->buf);
+z->buf = NULL;
 g_free(p->data);
 p->data = NULL;
 }
@@ -114,8 +130,14 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 flush = Z_SYNC_FLUSH;
 }
 
+/*
+ * Since the VM might be running, the page may be changing concurrently
+ * with compression. zlib does not guarantee that this is safe,
+ * therefore copy the page before calling deflate().
+ */
+memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
 zs->avail_in = page_size;
-zs->next_in = p->pages->block->host + p->normal[i];
+zs->next_in = z->buf;
 
 zs->avail_out = available;
 zs->next_out = z->zbuff + out_size;
-- 
2.36.1

[PULL 11/30] migration: Postcopy preemption preparation on channel creation

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <20220707185502.27149-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: With Peter's fix to quieten compiler warning on
   start_migration
---
 migration/migration.c| 63 +++
 migration/migration.h|  8 
 migration/postcopy-ram.c | 92 ++--
 migration/postcopy-ram.h | 10 +
 migration/ram.c  | 25 ---
 migration/ram.h  |  4 +-
 migration/savevm.c   | 20 -
 migration/socket.c   | 22 +-
 migration/socket.h   |  1 +
 migration/trace-events   |  5 ++-
 10 files changed, 219 insertions(+), 31 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ce7bb68cdc..c965cae1d4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
 mis->page_requested = NULL;
 }
 
+if (mis->postcopy_qemufile_dst) {
+migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+qemu_fclose(mis->postcopy_qemufile_dst);
+mis->postcopy_qemufile_dst = NULL;
+}
+
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error 
**errp)
 migration_incoming_process();
 }
 
+static bool migration_needs_multiple_sockets(void)
+{
+return migrate_use_multifd() || migrate_postcopy_preempt();
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 bool start_migration;
+QEMUFile *f;
 
 if (!mis->from_src_file) {
 /* The first connection (multifd may have multiple) */
-QEMUFile *f = qemu_file_new_input(ioc);
+f = qemu_file_new_input(ioc);
 
 if (!migration_incoming_setup(f, errp)) {
 return;
@@ -730,13 +742,19 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 
 /*
  * Common migration only needs one channel, so we can start
- * right now.  Multifd needs more than one channel, we wait.
+ * right now.  Some features need more than one channel, we wait.
  */
-start_migration = !migrate_use_multifd();
+start_migration = !migration_needs_multiple_sockets();
 } else {
 /* Multiple connections */
-assert(migrate_use_multifd());
-start_migration = multifd_recv_new_channel(ioc, _err);
+assert(migration_needs_multiple_sockets());
+if (migrate_use_multifd()) {
+start_migration = multifd_recv_new_channel(ioc, _err);
+} else {
+assert(migrate_postcopy_preempt());
+f = qemu_file_new_input(ioc);
+start_migration = postcopy_preempt_new_channel(mis, f);
+}
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -761,11 +779,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 bool migration_has_all_channels(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
-bool all_channels;
 
-all_channels = multifd_recv_all_channels_created();
+if (!mis->from_src_file) {
+return false;
+}
+
+if (migrate_use_multifd()) {
+return multifd_recv_all_channels_created();
+}
+
+if (migrate_postcopy_preempt()) {
+return mis->postcopy_qemufile_dst != NULL;
+}
 
-return all_channels && mis->from_src_file != NULL;
+return true;
 }
 
 /*
@@ -1874,6 +1901,12 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -3269,6 +3302,11 @@ static void migration_completion(MigrationState *s)
 qemu_savevm_state_complete_postcopy(s->to_dst_file);
 qemu_mutex_unlock_iothread();
 
+/* Shutdown the postcopy fast path thread */
+if (migrate_postcopy_preempt()) {
+postcopy_preempt_shutdown_file(s);
+}
+

[PULL 19/30] migration: Respect postcopy request order in preemption mode

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.

We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole.  We can't split a huge page
into two channels.

That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests.  The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.

The worst case time it needs can be as long as the whole postcopy migration
procedure.  It's extremely unlikely to happen, but when it happens it's not
good.

The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:

  1. Whether this page request is urgent, and,
  2. Which channel we should use for this page request.

With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false.  We can never have (1)
and (2) to have different values.

However it doesn't necessarily need to be like that.  It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel.  Just like the corner case we were discussing
above.

To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only.  Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.

With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page.  So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.

Reported-by: Manish Mishra 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185520.27583-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 65 +++--
 1 file changed, 52 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7cbe9c310d..4fbad74c6c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -442,8 +442,28 @@ struct PageSearchStatus {
 unsigned long page;
 /* Set once we wrap around */
 bool complete_round;
-/* Whether current page is explicitly requested by postcopy */
+/*
+ * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+ * postcopy.  When set, the request is "urgent" because the dest QEMU
+ * threads are waiting for us.
+ */
 bool postcopy_requested;
+/*
+ * [POSTCOPY-ONLY] The target channel to use to send current page.
+ *
+ * Note: This may _not_ match with the value in postcopy_requested
+ * above. Let's imagine the case where the postcopy request is exactly
+ * the page that we're sending in progress during precopy. In this case
+ * we'll have postcopy_requested set to true but the target channel
+ * will be the precopy channel (so that we don't split brain on that
+ * specific page since the precopy channel already contains partial of
+ * that page data).
+ *
+ * Besides that specific use case, postcopy_target_channel should
+ * always be equal to postcopy_requested, because by default we send
+ * postcopy pages via postcopy preempt channel.
+ */
+bool postcopy_target_channel;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -495,6 +515,9 @@ static QemuCond decomp_done_cond;
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
  ram_addr_t offset, uint8_t *source_buf);
 
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
+ bool postcopy_requested);
+
 static void *do_data_compress(void *opaque)
 {
 CompressParam *param = opaque;

[PULL 14/30] migration: Create the postcopy preempt channel asynchronously

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch allows the postcopy preempt channel to be created
asynchronously.  The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.

A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation.  The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.

We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.

For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start().  We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).

For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185509.27311-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c| 16 
 migration/migration.h|  7 +
 migration/postcopy-ram.c | 56 +++-
 migration/postcopy-ram.h |  1 +
 4 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3119bd2e4b..427d4de185 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3053,6 +3053,12 @@ static int postcopy_start(MigrationState *ms)
 int64_t bandwidth = migrate_max_postcopy_bandwidth();
 bool restart_block = false;
 int cur_state = MIGRATION_STATUS_ACTIVE;
+
+if (postcopy_preempt_wait_channel(ms)) {
+migrate_set_state(>state, ms->state, MIGRATION_STATUS_FAILED);
+return -1;
+}
+
 if (!migrate_pause_before_switchover()) {
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -3534,6 +3540,14 @@ static MigThrError postcopy_pause(MigrationState *s)
 if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
 /* Woken up by a recover procedure. Give it a shot */
 
+if (postcopy_preempt_wait_channel(s)) {
+/*
+ * Preempt enabled, and new channel create failed; loop
+ * back to wait for another recovery.
+ */
+continue;
+}
+
 /*
  * Firstly, let's wake up the return path now, with a new
  * return path channel.
@@ -4398,6 +4412,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_sem_destroy(>postcopy_pause_sem);
 qemu_sem_destroy(>postcopy_pause_rp_sem);
 qemu_sem_destroy(>rp_state.rp_sem);
+qemu_sem_destroy(>postcopy_qemufile_src_sem);
 error_free(ms->error);
 }
 
@@ -,6 +4459,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(>rp_state.rp_sem, 0);
 qemu_sem_init(>rate_limit_sem, 0);
 qemu_sem_init(>wait_unplug_sem, 0);
+qemu_sem_init(>postcopy_qemufile_src_sem, 0);
 qemu_mutex_init(>qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 9220cec6bd..ae4ffd3454 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,6 +219,13 @@ struct MigrationState {
 QEMUFile *to_dst_file;
 /* Postcopy specific transfer channel */
 QEMUFile *postcopy_qemufile_src;
+/*
+ * It is posted when the preempt channel is established.  Note: this is
+ * used for both the start or recover of a postcopy migration.  We'll
+ * post to this sem every time a new preempt channel is created in the
+ * main thread, and we keep post() and wait() in pair.
+ */
+QemuSemaphore postcopy_qemufile_src_sem;
 QIOChannelBuffer *bioc;
 /*
  * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 84f7b1526e..70b21e9d51 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1552,10 +1552,50 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
-int postcopy_preempt_setup(MigrationState *s, Error **errp)
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
 {
-QIOChannel *ioc;
+MigrationState *s = opaque;
+QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, _err)) {
+/* Something wrong happened.. */
+

[PULL 15/30] migration: Add property x-postcopy-preempt-break-huge

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption.  By default it's enabled.

It should only be used for debugging purposes, and we should never remove
the "x-" prefix.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185511.27366-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 2 ++
 migration/migration.h | 7 +++
 migration/ram.c   | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 427d4de185..864164ad96 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4363,6 +4363,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_SIZE("announce-step", MigrationState,
   parameters.announce_step,
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
+DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
+  postcopy_preempt_break_huge, true),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..cdad8aceaa 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,6 +340,13 @@ struct MigrationState {
 bool send_configuration;
 /* Whether we send section footer during migration */
 bool send_section_footer;
+/*
+ * Whether we allow break sending huge pages when postcopy preempt is
+ * enabled.  When disabled, we won't interrupt precopy within sending a
+ * host huge page, which is the old behavior of vanilla postcopy.
+ * NOTE: this parameter is ignored if postcopy preempt is not enabled.
+ */
+bool postcopy_preempt_break_huge;
 
 /* Needed by postcopy-pause state */
 QemuSemaphore postcopy_pause_sem;
diff --git a/migration/ram.c b/migration/ram.c
index 65b08c4edb..7cbe9c310d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2266,11 +2266,18 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 
 static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
 {
+MigrationState *ms = migrate_get_current();
+
 /* Not enabled eager preempt?  Then never do that. */
 if (!migrate_postcopy_preempt()) {
 return false;
 }
 
+/* If the user explicitly disabled breaking of huge page, skip */
+if (!ms->postcopy_preempt_break_huge) {
+return false;
+}
+
 /* If the ramblock we're sending is a small page?  Never bother. */
 if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
 return false;
-- 
2.36.1

[PULL 07/30] softmmu/dirtylimit: Implement dirty page rate limit

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Message-Id: 
<4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 ++
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 194 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 324 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 3ffa24bd67..188d9ece3b 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index c9d465735a..182e639d14 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1768,3 +1768,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 2e89a97bd6..a618eb1e4e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 void hmp_info_stats(Monitor *mon, const QDict *qdict);
diff --git a/qapi/migration.json b/qapi/migration.json
index 7102e474a6..e552ee4f43 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1868,6 +1868,86 @@
 ##
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
+##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.1
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe

[PULL 06/30] softmmu/dirtylimit: Implement virtual CPU throttle

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c |  20 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 ++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 +
 5 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 184aecab5c..3187656570 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 #include "monitor/stats.h"
@@ -477,6 +478,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1470,6 +1472,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2975,8 +2982,19 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/*
+ * We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f94059f..500503da13 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f03d6..8d2c1f3a6b 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index ebdc064c9d..e5a4f970bd 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 }

[PULL 26/30] Add dirty-sync-missed-zero-copy migration stat

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Signed-off-by: Leonardo Bras 
Acked-by: Markus Armbruster 
Acked-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <2022071122.18951-3-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 2 ++
 monitor/hmp-cmds.c| 5 +
 qapi/migration.json   | 7 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7c7e529ca7..15ae48b209 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1057,6 +1057,8 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->normal_bytes = ram_counters.normal * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
+info->ram->dirty_sync_missed_zero_copy =
+ram_counters.dirty_sync_missed_zero_copy;
 info->ram->postcopy_requests = ram_counters.postcopy_requests;
 info->ram->page_size = page_size;
 info->ram->multifd_bytes = ram_counters.multifd_bytes;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ca98df0495..a6dc79e0d5 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -307,6 +307,11 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
info->ram->postcopy_bytes >> 10);
 }
+if (info->ram->dirty_sync_missed_zero_copy) {
+monitor_printf(mon,
+   "Zero-copy-send fallbacks happened: %" PRIu64 " 
times\n",
+   info->ram->dirty_sync_missed_zero_copy);
+}
 }
 
 if (info->has_disk) {
diff --git a/qapi/migration.json b/qapi/migration.json
index 7586df3dea..81185d4311 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -55,6 +55,10 @@
 # @postcopy-bytes: The number of bytes sent during the post-copy phase
 #  (since 7.0).
 #
+# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
+#   not avoid copying dirty pages. This is between
+#   0 and @dirty-sync-count * @multifd-channels.
+#   (since 7.1)
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -65,7 +69,8 @@
'postcopy-requests' : 'int', 'page-size' : 'int',
'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
-   'postcopy-bytes' : 'uint64' } }
+   'postcopy-bytes' : 'uint64',
+   'dirty-sync-missed-zero-copy' : 'uint64' } }
 
 ##
 # @XBZRLECacheStats:
-- 
2.36.1

[PULL 12/30] migration: Postcopy preemption enablement

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185504.27203-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c  |   2 +
 migration/migration.h  |   2 +-
 migration/ram.c| 251 +++--
 migration/trace-events |   7 ++
 4 files changed, 253 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c965cae1d4..c5f0fdf8f8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3190,6 +3190,8 @@ static int postcopy_start(MigrationState *ms)
   MIGRATION_STATUS_FAILED);
 }
 
+trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+
 return ret;
 
 fail_closefb:
diff --git a/migration/migration.h b/migration/migration.h
index 941c61e543..ff714c235f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -68,7 +68,7 @@ typedef struct {
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
 /* Previously received RAM's RAMBlock pointer */
-RAMBlock *last_recv_block;
+RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index e4364c0bff..65b08c4edb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -296,6 +296,20 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+typedef struct {
+/*
+ * Cached ramblock/offset values if preempted.  They're only meaningful if
+ * preempted==true below.
+ */
+RAMBlock *ram_block;
+unsigned long ram_page;
+/*
+ * Whether a postcopy preemption just happened.  Will be reset after
+ * precopy recovered to background migration.
+ */
+bool preempted;
+} PostcopyPreemptState;
+
 /* State of RAM for migration */
 struct RAMState {
 /* QEMUFile used for this migration */
@@ -350,6 +364,14 @@ struct RAMState {
 /* Queue of outstanding page requests from the destination */
 QemuMutex src_page_req_mutex;
 QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+/* Postcopy preemption informations */
+PostcopyPreemptState postcopy_preempt_state;
+/*
+ * Current channel we're using on src VM.  Only valid if postcopy-preempt
+ * is enabled.
+ */
+unsigned int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -357,6 +379,11 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
+static void postcopy_preempt_reset(RAMState *rs)
+{
+memset(>postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
+}
+
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -1947,6

[PULL 02/30] cpus: Introduce cpu_list_generation_id

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index db459b41ce..793364dc0e 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5968551a05..2281be4e10 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -35,6 +35,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
2.36.1

[PULL 10/30] migration: Add postcopy-preempt capability

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish.  That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <20220707185342.26794-2-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 18 ++
 migration/migration.h |  1 +
 qapi/migration.json   |  7 ++-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 78f5057373..ce7bb68cdc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1297,6 +1297,13 @@ static bool migrate_caps_check(bool *cap_list,
 return false;
 }
 
+if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
+if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+error_setg(errp, "Postcopy preempt requires postcopy-ram");
+return false;
+}
+}
+
 return true;
 }
 
@@ -2663,6 +2670,15 @@ bool migrate_background_snapshot(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
 }
 
+bool migrate_postcopy_preempt(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -4274,6 +4290,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-compress", MIGRATION_CAPABILITY_COMPRESS),
 DEFINE_PROP_MIG_CAP("x-events", MIGRATION_CAPABILITY_EVENTS),
 DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
+DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
+MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
 DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
 DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
 DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
diff --git a/migration/migration.h b/migration/migration.h
index 485d58b95f..d2269c826c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -400,6 +400,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_preempt(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index e552ee4f43..7586df3dea 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -467,6 +467,11 @@
 #  Requires that QEMU be permitted to use locked memory
 #  for guest RAM pages.
 #  (since 7.1)
+# @postcopy-preempt: If enabled, the migration process will allow postcopy
+#requests to preempt precopy stream, so postcopy requests
+#will be handled faster.  This is a performance feature and
+#should not affect the correctness of postcopy migration.
+#(since 7.1)
 #
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -482,7 +487,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send'] }
+   'zero-copy-send', 'postcopy-preempt'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.36.1

[PULL 05/30] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 5 +
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ce989a68ff..184aecab5c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2318,6 +2318,11 @@ static void query_stats_cb(StatsResultList **result, 
StatsTarget target,
strList *names, strList *targets, Error **errp);
 static void query_stats_schemas_cb(StatsSchemaList **result, Error **errp);
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882d85..2ac5f9c036 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,8 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78868..efd6dee818 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
2.36.1

[PULL 13/30] migration: Postcopy recover with preempt enabled

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185506.27257-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c| 27 +++
 migration/migration.h| 19 +++
 migration/postcopy-ram.c | 25 +++--
 migration/qemu-file.c| 27 +++
 migration/qemu-file.h|  1 +
 migration/savevm.c   | 26 --
 migration/trace-events   |  2 ++
 7 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c5f0fdf8f8..3119bd2e4b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
 current_incoming->postcopy_remote_fds =
 g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
 qemu_mutex_init(_incoming->rp_mutex);
+qemu_mutex_init(_incoming->postcopy_prio_thread_mutex);
 qemu_event_init(_incoming->main_thread_load_event, false);
 qemu_sem_init(_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(_incoming->postcopy_pause_sem_fault, 0);
+qemu_sem_init(_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
 /*
  * Here, we only wake up the main loading thread (while the
- * fault thread will still be waiting), so that we can receive
+ * rest threads will still be waiting), so that we can receive
  * commands from source now, and answer it if needed. The
- * fault thread will be woken up afterwards until we are sure
+ * rest threads will be woken up afterwards until we are sure
  * that source is ready to reply to page requests.
  */
 qemu_sem_post(>postcopy_pause_sem_dst);
@@ -3503,6 +3505,18 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because no racer as long as we do this before setting
+ * status to paused.
+ */
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_file_shutdown(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 migrate_set_state(>state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -3558,8 +3572,13 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 return MIG_THR_ERR_FATAL;
 }
 
-/* Try to detect any file errors */
-ret = qemu_file_get_error_obj(s->to_dst_file, _error);
+/*
+ * Try to detect any file errors.  Note that postcopy_qemufile_src will
+ * be NULL when postcopy preempt is not enabled.
+ */
+ret = qemu_file_get_error_obj_any(s->to_dst_file,
+  s->postcopy_qemufile_src,
+  _error);
 if (!ret) {
 /* Everything is fine */
 assert(!local_error);
diff --git a/migration/migration.h b/migration/migration.h
index ff714c235f..9220cec6bd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,18 @@ struct MigrationIncomingState {
 /* Postcopy priority thread is used to receive postcopy requested pages */
 QemuThread postcopy_prio_thread;
 bool postcopy_prio_thread_created;
+/*
+ * Used to sync between the ram load main thread and the fast ram load
+ * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+ * fast channel.
+ *
+ * The ram fast load thread will take it mostly for the whole lifecycle
+ * because it needs to continuously read data from the channel, and
+ * it'll only release this mutex if postcopy is interrupted, so that
+ * the ram load main thread will take this mutex over and properly
+ * release the broken channel.
+ */
+QemuMutex postcopy_prio_thread_mutex;
 /*
  * An array of temp host huge pages to be used,

[PULL 03/30] migration/dirtyrate: Refactor dirty page rate calculation

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/sysemu/dirtyrate.h |  28 +
 migration/dirtyrate.c  | 227 +++--
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 00..4d3b9a4902
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index aace12a787..795fab5c37 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id =

[PULL 08/30] tests: Add dirty page rate limit test

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring,

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-helpers.c |  22 +++
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 256 
 3 files changed, 280 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index e81e831c85..c6fbeb3974 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -83,6 +83,28 @@ QDict *wait_command(QTestState *who, const char *command, 
...)
 return ret;
 }
 
+/*
+ * Execute the qmp command only
+ */
+QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 78587c2b82..59561898d0 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,8 @@ QDict *wait_command_fd(QTestState *who, int fd, const char 
*command, ...);
 G_GNUC_PRINTF(2, 3)
 QDict *wait_command(QTestState *who, const char *command, ...);
 
+QDict *qmp_command(QTestState *who, const char *command, ...);
+
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9e64125f02..db4dcc5b31 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -24,6 +24,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
 #include "crypto/tlscredspsk.h"
+#include "qapi/qmp/qlist.h"
 
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
@@ -46,6 +47,12 @@ unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
 
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+
 #if defined(__linux__)
 #include 
 #include 
@@ -2059,6 +2066,253 @@ static void test_multifd_tcp_cancel(void)
 test_migrate_end(from, to2, true);
 }
 
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t time_s)
+{
+int max_try_count = 1;
+usleep(time_s * 100);
+
+while (!calc_dirtyrate_ready(who) && max_try_count--) {
+usleep(1000);
+}
+
+/*
+ * Set the timeout with 10 s(max_try_count * 1000us),
+ * if dirtyrate measurement not complete, fail test.
+ */
+g_assert_cmpint(max_try_count, !=, 0);
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status =

[PULL 00/30] migration queue

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit 68e26e1e812c8b09313d7929271f6cbd47ef4c07:

  Merge tag 'pull-la-20220719' of https://gitlab.com/rth7680/qemu into staging 
(2022-07-19 22:54:43 +0100)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220720c

for you to fetch changes up to db727a14108b5f7ee1273f94e8ccce428a646140:

  Revert "gitlab: disable accelerated zlib for s390x" (2022-07-20 12:15:09 
+0100)


Migration pull 2022-07-20

This replaces yesterdays pull and:
  a) Fixes some test build errors without TLS
  b) Reenabled the zlib acceleration on s390
 now that we have Ilya's fix

  Hyman's dirty page rate limit set
  Ilya's fix for zlib vs migration
  Peter's postcopy-preempt
  Cleanup from Dan
  zero-copy tidy ups from Leo
  multifd doc fix from Juan
  Revert disable of zlib acceleration on s390x

Signed-off-by: Dr. David Alan Gilbert 


Daniel P. Berrangé (1):
  migration: remove unreachable code after reading data

Dr. David Alan Gilbert (1):
  Revert "gitlab: disable accelerated zlib for s390x"

Hyman Huang (8):
  accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
  cpus: Introduce cpu_list_generation_id
  migration/dirtyrate: Refactor dirty page rate calculation
  softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically
  accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function
  softmmu/dirtylimit: Implement virtual CPU throttle
  softmmu/dirtylimit: Implement dirty page rate limit
  tests: Add dirty page rate limit test

Ilya Leoshkevich (1):
  multifd: Copy pages before compressing them with zlib

Juan Quintela (1):
  multifd: Document the locking of MultiFD{Send/Recv}Params

Leonardo Bras (4):
  QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  Add dirty-sync-missed-zero-copy migration stat
  migration/multifd: Report to user when zerocopy not working
  migration: Avoid false-positive on non-supported scenarios for 
zero-copy-send

Peter Xu (14):
  migration: Add postcopy-preempt capability
  migration: Postcopy preemption preparation on channel creation
  migration: Postcopy preemption enablement
  migration: Postcopy recover with preempt enabled
  migration: Create the postcopy preempt channel asynchronously
  migration: Add property x-postcopy-preempt-break-huge
  migration: Add helpers to detect TLS capability
  migration: Export tls-[creds|hostname|authz] params to cmdline too
  migration: Enable TLS for preempt channel
  migration: Respect postcopy request order in preemption mode
  tests: Move MigrateCommon upper
  tests: Add postcopy tls migration test
  tests: Add postcopy tls recovery migration test
  tests: Add postcopy preempt tests

 .gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml |  12 -
 .travis.yml|   6 +-
 accel/kvm/kvm-all.c|  46 +-
 accel/stubs/kvm-stub.c |   5 +
 cpus-common.c  |   8 +
 hmp-commands-info.hx   |  13 +
 hmp-commands.hx|  32 ++
 include/exec/cpu-common.h  |   1 +
 include/exec/memory.h  |   5 +-
 include/hw/core/cpu.h  |   6 +
 include/monitor/hmp.h  |   3 +
 include/sysemu/dirtylimit.h|  37 ++
 include/sysemu/dirtyrate.h |  28 +
 include/sysemu/kvm.h   |   2 +
 io/channel-socket.c|   8 +-
 migration/channel.c|   9 +-
 migration/dirtyrate.c  | 227 +---
 migration/dirtyrate.h  |   7 +-
 migration/migration.c  | 152 +-
 migration/migration.h  |  44 +-
 migration/multifd-zlib.c   |  38 +-
 migration/multifd.c|   6 +-
 migration/multifd.h|  66 ++-
 migration/postcopy-ram.c   | 186 ++-
 migration/postcopy-ram.h   |  11 +
 migration/qemu-file.c  |  31 +-
 migration/qemu-file.h  |   1 +
 migration/ram.c| 331 +++-
 migration/ram.h|   6 +-
 migration/savevm.c |  46 +-
 migration/socket.c |  22 +-
 migration/socket.h |   1 +
 migration/tls.c

[PULL 01/30] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ed8b6b896e..ce989a68ff 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -757,17 +757,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -788,7 +791,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -808,7 +811,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -855,7 +858,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1399,7 +1402,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1471,7 +1474,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2967,7 +2970,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
2.36.1

[PULL 04/30] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +++
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index a6a0f4d8ad..bfb1de8eea 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 00..da459f03d6
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 00..ebdc064c9d
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ ,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read_i64([cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(_dirty_rate_stat->running, 1);
+qemu_thread_create(_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+

[PATCH] Revert "gitlab: disable accelerated zlib for s390x"

2022-07-20 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

This reverts commit 309df6acb29346f89e1ee542b1986f60cab12b87.
With Ilya's 'multifd: Copy pages before compressing them with zlib'
in the latest migration series, this shouldn't be a problem any more.

Suggested-by: Peter Maydell 
Signed-off-by: Dr. David Alan Gilbert 
---
 .gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml | 12 
 .travis.yml|  6 ++
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml 
b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
index 9f1fe9e7dc..03e74c97db 100644
--- a/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
+++ b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
@@ -8,8 +8,6 @@ ubuntu-20.04-s390x-all-linux-static:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
  - if: "$S390X_RUNNER_AVAILABLE"
@@ -29,8 +27,6 @@ ubuntu-20.04-s390x-all:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  timeout: 75m
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -48,8 +44,6 @@ ubuntu-20.04-s390x-alldbg:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
@@ -71,8 +65,6 @@ ubuntu-20.04-s390x-clang:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
@@ -93,8 +85,6 @@ ubuntu-20.04-s390x-tci:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
@@ -114,8 +104,6 @@ ubuntu-20.04-s390x-notcg:
  tags:
  - ubuntu_20.04
  - s390x
- variables:
-DFLTCC: 0
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
when: manual
diff --git a/.travis.yml b/.travis.yml
index 4fdc9a6785..fb3baabca9 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -218,7 +218,6 @@ jobs:
 - TEST_CMD="make check check-tcg V=1"
 - CONFIG="--disable-containers 
--target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
 - UNRELIABLE=true
-- DFLTCC=0
   script:
 - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
 - |
@@ -258,7 +257,7 @@ jobs:
   env:
 - CONFIG="--disable-containers --audio-drv-list=sdl --disable-user
   --target-list-exclude=${MAIN_SOFTMMU_TARGETS}"
-- DFLTCC=0
+
 - name: "[s390x] GCC (user)"
   arch: s390x
   dist: focal
@@ -270,7 +269,7 @@ jobs:
   - ninja-build
   env:
 - CONFIG="--disable-containers --disable-system"
-- DFLTCC=0
+
 - name: "[s390x] Clang (disable-tcg)"
   arch: s390x
   dist: focal
@@ -304,4 +303,3 @@ jobs:
 - CONFIG="--disable-containers --disable-tcg --enable-kvm
   --disable-tools --host-cc=clang --cxx=clang++"
 - UNRELIABLE=true
-- DFLTCC=0
-- 
2.36.1

[PULL 28/29] multifd: Document the locking of MultiFD{Send/Recv}Params

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Juan Quintela 

Reorder the structures so we can know if the fields are:
- Read only
- Their own locking (i.e. sems)
- Protected by 'mutex'
- Only for the multifd channel

Signed-off-by: Juan Quintela 
Message-Id: <20220531104318.7494-2-quint...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Typo fixes from Chen Zhang
---
 migration/multifd.h | 66 -
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 4d8d89e5e5..519f498643 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -65,7 +65,9 @@ typedef struct {
 } MultiFDPages_t;
 
 typedef struct {
-/* this fields are not changed once the thread is created */
+/* Fields are only written at creating/deletion time */
+/* No lock required for them, they are read only */
+
 /* channel number */
 uint8_t id;
 /* channel thread name */
@@ -74,39 +76,47 @@ typedef struct {
 QemuThread thread;
 /* communication channel */
 QIOChannel *c;
+/* is the yank function registered */
+bool registered_yank;
+/* packet allocated len */
+uint32_t packet_len;
+/* multifd flags for sending ram */
+int write_flags;
+
 /* sem where to wait for more work */
 QemuSemaphore sem;
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+
 /* this mutex protects the following parameters */
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* should this thread finish */
 bool quit;
-/* is the yank function registered */
-bool registered_yank;
+/* multifd flags for each packet */
+uint32_t flags;
+/* global number of generated multifd packets */
+uint64_t packet_num;
 /* thread has work to do */
 int pending_job;
-/* array of pages to sent */
+/* array of pages to sent.
+ * The owner of 'pages' depends of 'pending_job' value:
+ * pending_job == 0 -> migration_thread can use it.
+ * pending_job != 0 -> multifd_channel can use it.
+ */
 MultiFDPages_t *pages;
-/* packet allocated len */
-uint32_t packet_len;
+
+/* thread local variables. No locking required */
+
 /* pointer to the packet */
 MultiFDPacket_t *packet;
-/* multifd flags for sending ram */
-int write_flags;
-/* multifd flags for each packet */
-uint32_t flags;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
-/* global number of generated multifd packets */
-uint64_t packet_num;
-/* thread local variables */
 /* packets sent through this channel */
 uint64_t num_packets;
 /* non zero pages sent through this channel */
 uint64_t total_normal_pages;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
 /* buffers to send */
 struct iovec *iov;
 /* number of iovs used */
@@ -120,7 +130,9 @@ typedef struct {
 }  MultiFDSendParams;
 
 typedef struct {
-/* this fields are not changed once the thread is created */
+/* Fields are only written at creating/deletion time */
+/* No lock required for them, they are read only */
+
 /* channel number */
 uint8_t id;
 /* channel thread name */
@@ -129,31 +141,35 @@ typedef struct {
 QemuThread thread;
 /* communication channel */
 QIOChannel *c;
+/* packet allocated len */
+uint32_t packet_len;
+
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+
 /* this mutex protects the following parameters */
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* should this thread finish */
 bool quit;
-/* ramblock host address */
-uint8_t *host;
-/* packet allocated len */
-uint32_t packet_len;
-/* pointer to the packet */
-MultiFDPacket_t *packet;
 /* multifd flags for each packet */
 uint32_t flags;
 /* global number of generated multifd packets */
 uint64_t packet_num;
-/* thread local variables */
+
+/* thread local variables. No locking required */
+
+/* pointer to the packet */
+MultiFDPacket_t *packet;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
 /* packets sent through this channel */
 uint64_t num_packets;
+/* ramblock host address */
+uint8_t *host;
 /* non zero pages recv through this channel */
 uint64_t total_normal_pages;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
 /* buffers to recv */
 struct iovec *iov;
 /* Pages that are not zero */
-- 
2.36.1

[PULL 21/29] tests: Add postcopy tls migration test

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

We just added TLS tests for precopy but not postcopy.  Add the
corresponding test for vanilla postcopy.

Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.

Signed-off-by: Peter Xu 
Message-Id: <20220707185525.27692-1-pet...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 61 ++--
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f3931e0a92..b2020ef6c5 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -573,6 +573,9 @@ typedef struct {
 
 /* Optional: set number of migration passes to wait for */
 unsigned int iterations;
+
+/* Postcopy specific fields */
+void *postcopy_data;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1061,15 +1064,19 @@ test_migrate_tls_x509_finish(QTestState *from,
 
 static int migrate_postcopy_prepare(QTestState **from_ptr,
 QTestState **to_ptr,
-MigrateStart *args)
+MigrateCommon *args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 QTestState *from, *to;
 
-if (test_migrate_start(, , uri, args)) {
+if (test_migrate_start(, , uri, >start)) {
 return -1;
 }
 
+if (args->start_hook) {
+args->postcopy_data = args->start_hook(from, to);
+}
+
 migrate_set_capability(from, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
@@ -1089,7 +1096,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 return 0;
 }
 
-static void migrate_postcopy_complete(QTestState *from, QTestState *to)
+static void migrate_postcopy_complete(QTestState *from, QTestState *to,
+  MigrateCommon *args)
 {
 wait_for_migration_complete(from);
 
@@ -1100,25 +1108,50 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to)
 read_blocktime(to);
 }
 
+if (args->finish_hook) {
+args->finish_hook(from, to, args->postcopy_data);
+args->postcopy_data = NULL;
+}
+
 test_migrate_end(from, to, true);
 }
 
-static void test_postcopy(void)
+static void test_postcopy_common(MigrateCommon *args)
 {
-MigrateStart args = {};
 QTestState *from, *to;
 
-if (migrate_postcopy_prepare(, , )) {
+if (migrate_postcopy_prepare(, , args)) {
 return;
 }
 migrate_postcopy_start(from, to);
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, args);
 }
 
+static void test_postcopy(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_common();
+}
+
+#ifdef CONFIG_GNUTLS
+static void test_postcopy_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common();
+}
+#endif
+
 static void test_postcopy_recovery(void)
 {
-MigrateStart args = {
-.hide_stderr = true,
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+},
 };
 QTestState *from, *to;
 g_autofree char *uri = NULL;
@@ -1174,7 +1207,7 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, );
 }
 
 static void test_baddest(void)
@@ -2378,12 +2411,20 @@ int main(int argc, char **argv)
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
+
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
+/*
+ * NOTE: psk test is enough for postcopy, as other types of TLS
+ * channels are tested under precopy.  Here what we want to test is the
+ * general postcopy path that has TLS channel enabled.
+ */
+qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 27/29] migration/multifd: Report to user when zerocopy not working

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Peter Xu 
Message-Id: <2022071122.18951-4-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/multifd.c | 2 ++
 migration/ram.c | 5 +
 migration/ram.h | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1e49594b02..586ddc9d65 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
 if (ret < 0) {
 error_report_err(err);
 return -1;
+} else if (ret == 1) {
+dirty_sync_missed_zero_copy();
 }
 }
 }
diff --git a/migration/ram.c b/migration/ram.c
index 4fbad74c6c..b94669ba5d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -434,6 +434,11 @@ static void ram_transferred_add(uint64_t bytes)
 ram_counters.transferred += bytes;
 }
 
+void dirty_sync_missed_zero_copy(void)
+{
+ram_counters.dirty_sync_missed_zero_copy++;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
 /* Current block being searched */
diff --git a/migration/ram.h b/migration/ram.h
index 5d90945a6e..c7af65ac74 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -89,4 +89,6 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+void dirty_sync_missed_zero_copy(void);
+
 #endif
-- 
2.36.1

[PULL 24/29] migration: remove unreachable code after reading data

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

The code calls qio_channel_read() in a loop when it reports
QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.

As such the later block of code will always hit the 'errno != EAGAIN'
condition, making the final 'else' unreachable.

Fixes: Coverity CID 1490203
Signed-off-by: Daniel P. Berrangé 
Message-Id: <20220627135318.156121-1-berra...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2f266b25cd..4f400c2e52 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -411,10 +411,8 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 f->total_transferred += len;
 } else if (len == 0) {
 qemu_file_set_error_obj(f, -EIO, local_error);
-} else if (len != -EAGAIN) {
-qemu_file_set_error_obj(f, len, local_error);
 } else {
-error_free(local_error);
+qemu_file_set_error_obj(f, len, local_error);
 }
 
 return len;
-- 
2.36.1

[PULL 23/29] tests: Add postcopy preempt tests

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Four tests are added for preempt mode:

  - Postcopy plain
  - Postcopy recovery
  - Postcopy tls
  - Postcopy tls+recovery

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185530.27801-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 57 ++--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e9350ea8c6..02f2ef9f49 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -576,6 +576,7 @@ typedef struct {
 
 /* Postcopy specific fields */
 void *postcopy_data;
+bool postcopy_preempt;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1081,6 +1082,11 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
+if (args->postcopy_preempt) {
+migrate_set_capability(from, "postcopy-preempt", true);
+migrate_set_capability(to, "postcopy-preempt", true);
+}
+
 migrate_ensure_non_converge(from);
 
 /* Wait for the first serial output from the source */
@@ -1146,6 +1152,26 @@ static void test_postcopy_tls_psk(void)
 }
 #endif
 
+static void test_postcopy_preempt(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_common();
+}
+
+static void test_postcopy_preempt_tls_psk(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common();
+}
+
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
 QTestState *from, *to;
@@ -1225,6 +1251,27 @@ static void test_postcopy_recovery_tls_psk(void)
 test_postcopy_recovery_common();
 }
 
+static void test_postcopy_preempt_recovery(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_recovery_common();
+}
+
+/* This contains preempt+recovery+tls test altogether */
+static void test_postcopy_preempt_all(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common();
+}
+
 static void test_baddest(void)
 {
 MigrateStart args = {
@@ -2425,10 +2472,12 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery/plain",
test_postcopy_recovery);
-
-qtest_add_func("/migration/postcopy/plain", test_postcopy);
+qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery/plain",
+test_postcopy_preempt_recovery);
 
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
@@ -2444,6 +2493,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 qtest_add_func("/migration/postcopy/recovery/tls/psk",
test_postcopy_recovery_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/tls/psk",
+   test_postcopy_preempt_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk",
+   test_postcopy_preempt_all);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 05/29] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 5 +
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ce989a68ff..184aecab5c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2318,6 +2318,11 @@ static void query_stats_cb(StatsResultList **result, 
StatsTarget target,
strList *names, strList *targets, Error **errp);
 static void query_stats_schemas_cb(StatsSchemaList **result, Error **errp);
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882d85..2ac5f9c036 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,8 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78868..efd6dee818 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
2.36.1

[PULL 29/29] migration: Avoid false-positive on non-supported scenarios for zero-copy-send

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.

But currently there are some ways of setting this not-supported scenarios
without printing the error message:

!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].

The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.

In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().

2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.

In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().

3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.

Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter 
to migration capability")
Signed-off-by: Leonardo Bras 
Message-Id: <20220719122345.253713-1-leob...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 15ae48b209..e03f698a3c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1306,7 +1306,9 @@ static bool migrate_caps_check(bool *cap_list,
 #ifdef CONFIG_LINUX
 if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND] &&
 (!cap_list[MIGRATION_CAPABILITY_MULTIFD] ||
- migrate_use_compression() ||
+ cap_list[MIGRATION_CAPABILITY_COMPRESS] ||
+ cap_list[MIGRATION_CAPABILITY_XBZRLE] ||
+ migrate_multifd_compression() ||
  migrate_use_tls())) {
 error_setg(errp,
"Zero copy only available for non-compressed non-TLS 
multifd migration");
@@ -1550,6 +1552,17 @@ static bool migrate_params_check(MigrationParameters 
*params, Error **errp)
 error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
");
 return false;
 }
+
+#ifdef CONFIG_LINUX
+if (migrate_use_zero_copy_send() &&
+((params->has_multifd_compression && params->multifd_compression) ||
+ (params->has_tls_creds && params->tls_creds && *params->tls_creds))) {
+error_setg(errp,
+   "Zero copy only available for non-compressed non-TLS 
multifd migration");
+return false;
+}
+#endif
+
 return true;
 }
 
-- 
2.36.1

[PULL 07/29] softmmu/dirtylimit: Implement dirty page rate limit

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Message-Id: 
<4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 ++
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 194 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 324 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 3ffa24bd67..188d9ece3b 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index c9d465735a..182e639d14 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1768,3 +1768,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 2e89a97bd6..a618eb1e4e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 void hmp_info_stats(Monitor *mon, const QDict *qdict);
diff --git a/qapi/migration.json b/qapi/migration.json
index 7102e474a6..e552ee4f43 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1868,6 +1868,86 @@
 ##
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
+##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.1
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe

[PULL 26/29] Add dirty-sync-missed-zero-copy migration stat

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Signed-off-by: Leonardo Bras 
Acked-by: Markus Armbruster 
Acked-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <2022071122.18951-3-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 2 ++
 monitor/hmp-cmds.c| 5 +
 qapi/migration.json   | 7 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7c7e529ca7..15ae48b209 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1057,6 +1057,8 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->normal_bytes = ram_counters.normal * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
+info->ram->dirty_sync_missed_zero_copy =
+ram_counters.dirty_sync_missed_zero_copy;
 info->ram->postcopy_requests = ram_counters.postcopy_requests;
 info->ram->page_size = page_size;
 info->ram->multifd_bytes = ram_counters.multifd_bytes;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ca98df0495..a6dc79e0d5 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -307,6 +307,11 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
info->ram->postcopy_bytes >> 10);
 }
+if (info->ram->dirty_sync_missed_zero_copy) {
+monitor_printf(mon,
+   "Zero-copy-send fallbacks happened: %" PRIu64 " 
times\n",
+   info->ram->dirty_sync_missed_zero_copy);
+}
 }
 
 if (info->has_disk) {
diff --git a/qapi/migration.json b/qapi/migration.json
index 7586df3dea..81185d4311 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -55,6 +55,10 @@
 # @postcopy-bytes: The number of bytes sent during the post-copy phase
 #  (since 7.0).
 #
+# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
+#   not avoid copying dirty pages. This is between
+#   0 and @dirty-sync-count * @multifd-channels.
+#   (since 7.1)
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -65,7 +69,8 @@
'postcopy-requests' : 'int', 'page-size' : 'int',
'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
-   'postcopy-bytes' : 'uint64' } }
+   'postcopy-bytes' : 'uint64',
+   'dirty-sync-missed-zero-copy' : 'uint64' } }
 
 ##
 # @XBZRLECacheStats:
-- 
2.36.1

[PULL 06/29] softmmu/dirtylimit: Implement virtual CPU throttle

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c |  20 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 ++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 +
 5 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 184aecab5c..3187656570 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 #include "monitor/stats.h"
@@ -477,6 +478,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1470,6 +1472,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2975,8 +2982,19 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/*
+ * We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f94059f..500503da13 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f03d6..8d2c1f3a6b 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index ebdc064c9d..e5a4f970bd 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 }

[PULL 22/29] tests: Add postcopy tls recovery migration test

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

It's easy to build this upon the postcopy tls test.  Rename the old
postcopy recovery test to postcopy/recovery/plain.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185527.27747-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 37 +++-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b2020ef6c5..e9350ea8c6 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1146,17 +1146,15 @@ static void test_postcopy_tls_psk(void)
 }
 #endif
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_recovery_common(MigrateCommon *args)
 {
-MigrateCommon args = {
-.start = {
-.hide_stderr = true,
-},
-};
 QTestState *from, *to;
 g_autofree char *uri = NULL;
 
-if (migrate_postcopy_prepare(, , )) {
+/* Always hide errors for postcopy recover tests since they're expected */
+args->start.hide_stderr = true;
+
+if (migrate_postcopy_prepare(, , args)) {
 return;
 }
 
@@ -1207,7 +1205,24 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to, );
+migrate_postcopy_complete(from, to, args);
+}
+
+static void test_postcopy_recovery(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_recovery_common();
+}
+
+static void test_postcopy_recovery_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common();
 }
 
 static void test_baddest(void)
@@ -2410,7 +2425,9 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
-qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/recovery/plain",
+   test_postcopy_recovery);
+
 qtest_add_func("/migration/postcopy/plain", test_postcopy);
 
 qtest_add_func("/migration/bad_dest", test_baddest);
@@ -2425,6 +2442,8 @@ int main(int argc, char **argv)
  * general postcopy path that has TLS channel enabled.
  */
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
+qtest_add_func("/migration/postcopy/recovery/tls/psk",
+   test_postcopy_recovery_tls_psk);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 08/29] tests: Add dirty page rate limit test

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring,

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-helpers.c |  22 +++
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 256 
 3 files changed, 280 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index e81e831c85..c6fbeb3974 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -83,6 +83,28 @@ QDict *wait_command(QTestState *who, const char *command, 
...)
 return ret;
 }
 
+/*
+ * Execute the qmp command only
+ */
+QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 78587c2b82..59561898d0 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,8 @@ QDict *wait_command_fd(QTestState *who, int fd, const char 
*command, ...);
 G_GNUC_PRINTF(2, 3)
 QDict *wait_command(QTestState *who, const char *command, ...);
 
+QDict *qmp_command(QTestState *who, const char *command, ...);
+
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9e64125f02..db4dcc5b31 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -24,6 +24,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
 #include "crypto/tlscredspsk.h"
+#include "qapi/qmp/qlist.h"
 
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
@@ -46,6 +47,12 @@ unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
 
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+
 #if defined(__linux__)
 #include 
 #include 
@@ -2059,6 +2066,253 @@ static void test_multifd_tcp_cancel(void)
 test_migrate_end(from, to2, true);
 }
 
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t time_s)
+{
+int max_try_count = 1;
+usleep(time_s * 100);
+
+while (!calc_dirtyrate_ready(who) && max_try_count--) {
+usleep(1000);
+}
+
+/*
+ * Set the timeout with 10 s(max_try_count * 1000us),
+ * if dirtyrate measurement not complete, fail test.
+ */
+g_assert_cmpint(max_try_count, !=, 0);
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status =

[PULL 20/29] tests: Move MigrateCommon upper

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

So that it can be used in postcopy tests too soon.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185522.27638-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-test.c | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index db4dcc5b31..f3931e0a92 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -503,6 +503,78 @@ typedef struct {
 const char *opts_target;
 } MigrateStart;
 
+/*
+ * A hook that runs after the src and dst QEMUs have been
+ * created, but before the migration is started. This can
+ * be used to set migration parameters and capabilities.
+ *
+ * Returns: NULL, or a pointer to opaque state to be
+ *  later passed to the TestMigrateFinishHook
+ */
+typedef void * (*TestMigrateStartHook)(QTestState *from,
+   QTestState *to);
+
+/*
+ * A hook that runs after the migration has finished,
+ * regardless of whether it succeeded or failed, but
+ * before QEMU has terminated (unless it self-terminated
+ * due to migration error)
+ *
+ * @opaque is a pointer to state previously returned
+ * by the TestMigrateStartHook if any, or NULL.
+ */
+typedef void (*TestMigrateFinishHook)(QTestState *from,
+  QTestState *to,
+  void *opaque);
+
+typedef struct {
+/* Optional: fine tune start parameters */
+MigrateStart start;
+
+/* Required: the URI for the dst QEMU to listen on */
+const char *listen_uri;
+
+/*
+ * Optional: the URI for the src QEMU to connect to
+ * If NULL, then it will query the dst QEMU for its actual
+ * listening address and use that as the connect address.
+ * This allows for dynamically picking a free TCP port.
+ */
+const char *connect_uri;
+
+/* Optional: callback to run at start to set migration parameters */
+TestMigrateStartHook start_hook;
+/* Optional: callback to run at finish to cleanup */
+TestMigrateFinishHook finish_hook;
+
+/*
+ * Optional: normally we expect the migration process to complete.
+ *
+ * There can be a variety of reasons and stages in which failure
+ * can happen during tests.
+ *
+ * If a failure is expected to happen at time of establishing
+ * the connection, then MIG_TEST_FAIL will indicate that the dst
+ * QEMU is expected to stay running and accept future migration
+ * connections.
+ *
+ * If a failure is expected to happen while processing the
+ * migration stream, then MIG_TEST_FAIL_DEST_QUIT_ERR will indicate
+ * that the dst QEMU is expected to quit with non-zero exit status
+ */
+enum {
+/* This test should succeed, the default */
+MIG_TEST_SUCCEED = 0,
+/* This test should fail, dest qemu should keep alive */
+MIG_TEST_FAIL,
+/* This test should fail, dest qemu should fail with abnormal status */
+MIG_TEST_FAIL_DEST_QUIT_ERR,
+} result;
+
+/* Optional: set number of migration passes to wait for */
+unsigned int iterations;
+} MigrateCommon;
+
 static int test_migrate_start(QTestState **from, QTestState **to,
   const char *uri, MigrateStart *args)
 {
@@ -1120,78 +1192,6 @@ static void test_baddest(void)
 test_migrate_end(from, to, false);
 }
 
-/*
- * A hook that runs after the src and dst QEMUs have been
- * created, but before the migration is started. This can
- * be used to set migration parameters and capabilities.
- *
- * Returns: NULL, or a pointer to opaque state to be
- *  later passed to the TestMigrateFinishHook
- */
-typedef void * (*TestMigrateStartHook)(QTestState *from,
-   QTestState *to);
-
-/*
- * A hook that runs after the migration has finished,
- * regardless of whether it succeeded or failed, but
- * before QEMU has terminated (unless it self-terminated
- * due to migration error)
- *
- * @opaque is a pointer to state previously returned
- * by the TestMigrateStartHook if any, or NULL.
- */
-typedef void (*TestMigrateFinishHook)(QTestState *from,
-  QTestState *to,
-  void *opaque);
-
-typedef struct {
-/* Optional: fine tune start parameters */
-MigrateStart start;
-
-/* Required: the URI for the dst QEMU to listen on */
-const char *listen_uri;
-
-/*
- * Optional: the URI for the src QEMU to connect to
- * If NULL, then it will query the dst QEMU for its actual
- * listening address and use that as the connect address.
- * This allows for dynamically picking a free TCP port.
- */
-const char *connect_uri;
-
-/* Optional: callback to run at start to set migration parameters */
-TestMigrateStartHook

[PULL 03/29] migration/dirtyrate: Refactor dirty page rate calculation

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/sysemu/dirtyrate.h |  28 +
 migration/dirtyrate.c  | 227 +++--
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 00..4d3b9a4902
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index aace12a787..795fab5c37 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id =

[PULL 25/29] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().

Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & 
io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Reviewed-by: Peter Xu 
Message-Id: <2022071122.18951-2-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 io/channel-socket.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 4466bb1cd4..74a936cc1f 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
 struct cmsghdr *cm;
 char control[CMSG_SPACE(sizeof(*serr))];
 int received;
-int ret = 1;
+int ret;
+
+if (sioc->zero_copy_queued == sioc->zero_copy_sent) {
+return 0;
+}
 
 msg.msg_control = control;
 msg.msg_controllen = sizeof(control);
 memset(control, 0, sizeof(control));
 
+ret = 1;
+
 while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
 received = recvmsg(sioc->fd, , MSG_ERRQUEUE);
 if (received < 0) {
-- 
2.36.1

[PULL 14/29] migration: Create the postcopy preempt channel asynchronously

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch allows the postcopy preempt channel to be created
asynchronously.  The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.

A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation.  The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.

We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.

For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start().  We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).

For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185509.27311-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c| 16 
 migration/migration.h|  7 +
 migration/postcopy-ram.c | 56 +++-
 migration/postcopy-ram.h |  1 +
 4 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3119bd2e4b..427d4de185 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3053,6 +3053,12 @@ static int postcopy_start(MigrationState *ms)
 int64_t bandwidth = migrate_max_postcopy_bandwidth();
 bool restart_block = false;
 int cur_state = MIGRATION_STATUS_ACTIVE;
+
+if (postcopy_preempt_wait_channel(ms)) {
+migrate_set_state(>state, ms->state, MIGRATION_STATUS_FAILED);
+return -1;
+}
+
 if (!migrate_pause_before_switchover()) {
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -3534,6 +3540,14 @@ static MigThrError postcopy_pause(MigrationState *s)
 if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
 /* Woken up by a recover procedure. Give it a shot */
 
+if (postcopy_preempt_wait_channel(s)) {
+/*
+ * Preempt enabled, and new channel create failed; loop
+ * back to wait for another recovery.
+ */
+continue;
+}
+
 /*
  * Firstly, let's wake up the return path now, with a new
  * return path channel.
@@ -4398,6 +4412,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_sem_destroy(>postcopy_pause_sem);
 qemu_sem_destroy(>postcopy_pause_rp_sem);
 qemu_sem_destroy(>rp_state.rp_sem);
+qemu_sem_destroy(>postcopy_qemufile_src_sem);
 error_free(ms->error);
 }
 
@@ -,6 +4459,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(>rp_state.rp_sem, 0);
 qemu_sem_init(>rate_limit_sem, 0);
 qemu_sem_init(>wait_unplug_sem, 0);
+qemu_sem_init(>postcopy_qemufile_src_sem, 0);
 qemu_mutex_init(>qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 9220cec6bd..ae4ffd3454 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,6 +219,13 @@ struct MigrationState {
 QEMUFile *to_dst_file;
 /* Postcopy specific transfer channel */
 QEMUFile *postcopy_qemufile_src;
+/*
+ * It is posted when the preempt channel is established.  Note: this is
+ * used for both the start or recover of a postcopy migration.  We'll
+ * post to this sem every time a new preempt channel is created in the
+ * main thread, and we keep post() and wait() in pair.
+ */
+QemuSemaphore postcopy_qemufile_src_sem;
 QIOChannelBuffer *bioc;
 /*
  * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 84f7b1526e..70b21e9d51 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1552,10 +1552,50 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
-int postcopy_preempt_setup(MigrationState *s, Error **errp)
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
 {
-QIOChannel *ioc;
+MigrationState *s = opaque;
+QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, _err)) {
+/* Something wrong happened.. */
+

[PULL 04/29] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +++
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index a6a0f4d8ad..bfb1de8eea 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 00..da459f03d6
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 00..ebdc064c9d
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ ,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read_i64([cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(_dirty_rate_stat->running, 1);
+qemu_thread_create(_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+

[PULL 15/29] migration: Add property x-postcopy-preempt-break-huge

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption.  By default it's enabled.

It should only be used for debugging purposes, and we should never remove
the "x-" prefix.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185511.27366-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 2 ++
 migration/migration.h | 7 +++
 migration/ram.c   | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 427d4de185..864164ad96 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4363,6 +4363,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_SIZE("announce-step", MigrationState,
   parameters.announce_step,
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
+DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
+  postcopy_preempt_break_huge, true),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..cdad8aceaa 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,6 +340,13 @@ struct MigrationState {
 bool send_configuration;
 /* Whether we send section footer during migration */
 bool send_section_footer;
+/*
+ * Whether we allow break sending huge pages when postcopy preempt is
+ * enabled.  When disabled, we won't interrupt precopy within sending a
+ * host huge page, which is the old behavior of vanilla postcopy.
+ * NOTE: this parameter is ignored if postcopy preempt is not enabled.
+ */
+bool postcopy_preempt_break_huge;
 
 /* Needed by postcopy-pause state */
 QemuSemaphore postcopy_pause_sem;
diff --git a/migration/ram.c b/migration/ram.c
index 65b08c4edb..7cbe9c310d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2266,11 +2266,18 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 
 static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
 {
+MigrationState *ms = migrate_get_current();
+
 /* Not enabled eager preempt?  Then never do that. */
 if (!migrate_postcopy_preempt()) {
 return false;
 }
 
+/* If the user explicitly disabled breaking of huge page, skip */
+if (!ms->postcopy_preempt_break_huge) {
+return false;
+}
+
 /* If the ramblock we're sending is a small page?  Never bother. */
 if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
 return false;
-- 
2.36.1

[PULL 16/29] migration: Add helpers to detect TLS capability

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls().  No
functional change intended.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185513.27421-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/channel.c   | 9 ++---
 migration/migration.c | 1 +
 migration/multifd.c   | 4 +---
 migration/tls.c   | 9 +
 migration/tls.h   | 4 
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 90087d8986..1b0815039f 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,9 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));
 
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_process_incoming(s, ioc, _err);
 } else {
 migration_ioc_register_yank(ioc);
@@ -70,10 +68,7 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_connect(s, ioc, hostname, );
 
 if (!error) {
diff --git a/migration/migration.c b/migration/migration.c
index 864164ad96..cc41787079 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -48,6 +48,7 @@
 #include "trace.h"
 #include "exec/target_page.h"
 #include "io/channel-buffer.h"
+#include "io/channel-tls.h"
 #include "migration/colo.h"
 #include "hw/boards.h"
 #include "hw/qdev-properties.h"
diff --git a/migration/multifd.c b/migration/multifd.c
index 684c014c86..1e49594b02 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -831,9 +831,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 migrate_get_current()->hostname, error);
 
 if (!error) {
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 multifd_tls_channel_connect(p, ioc, );
 if (!error) {
 /*
diff --git a/migration/tls.c b/migration/tls.c
index 32c384a8b6..73e8c9d3c2 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -166,3 +166,12 @@ void migration_tls_channel_connect(MigrationState *s,
   NULL,
   NULL);
 }
+
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
+{
+if (!migrate_use_tls()) {
+return false;
+}
+
+return !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+}
diff --git a/migration/tls.h b/migration/tls.h
index de4fe2cafd..98e23c9b0e 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -37,4 +37,8 @@ void migration_tls_channel_connect(MigrationState *s,
QIOChannel *ioc,
const char *hostname,
Error **errp);
+
+/* Whether the QIO channel requires further TLS handshake? */
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
+
 #endif
-- 
2.36.1

[PULL 18/29] migration: Enable TLS for preempt channel

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch is based on the async preempt channel creation.  It continues
wiring up the new channel with TLS handshake to destionation when enabled.

Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185518.27529-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/postcopy-ram.c | 57 ++--
 migration/trace-events   |  1 +
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 70b21e9d51..b9a37ef255 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -36,6 +36,7 @@
 #include "socket.h"
 #include "qemu-file.h"
 #include "yank_functions.h"
+#include "tls.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -1552,15 +1553,15 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
+/*
+ * Setup the postcopy preempt channel with the IOC.  If ERROR is specified,
+ * setup the error instead.  This helper will free the ERROR if specified.
+ */
 static void
-postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+postcopy_preempt_send_channel_done(MigrationState *s,
+   QIOChannel *ioc, Error *local_err)
 {
-MigrationState *s = opaque;
-QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-Error *local_err = NULL;
-
-if (qio_task_propagate_error(task, _err)) {
-/* Something wrong happened.. */
+if (local_err) {
 migrate_set_error(s, local_err);
 error_free(local_err);
 } else {
@@ -1574,7 +1575,47 @@ postcopy_preempt_send_channel_new(QIOTask *task, 
gpointer opaque)
  * postcopy_qemufile_src to know whether it failed or not.
  */
 qemu_sem_post(>postcopy_qemufile_src_sem);
-object_unref(OBJECT(ioc));
+}
+
+static void
+postcopy_preempt_tls_handshake(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+Error *local_err = NULL;
+
+qio_task_propagate_error(task, _err);
+postcopy_preempt_send_channel_done(s, ioc, local_err);
+}
+
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+QIOChannelTLS *tioc;
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, _err)) {
+goto out;
+}
+
+if (migrate_channel_requires_tls_upgrade(ioc)) {
+tioc = migration_tls_client_create(s, ioc, s->hostname, _err);
+if (!tioc) {
+goto out;
+}
+trace_postcopy_preempt_tls_handshake();
+qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-preempt");
+qio_channel_tls_handshake(tioc, postcopy_preempt_tls_handshake,
+  s, NULL, NULL);
+/* Setup the channel until TLS handshake finished */
+return;
+}
+
+out:
+/* This handles both good and error cases */
+postcopy_preempt_send_channel_done(s, ioc, local_err);
 }
 
 /* Returns 0 if channel established, -1 for error. */
diff --git a/migration/trace-events b/migration/trace-events
index 0e385c3a07..a34afe7b85 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -287,6 +287,7 @@ postcopy_request_shared_page(const char *sharer, const char 
*rb, uint64_t rb_off
 postcopy_request_shared_page_present(const char *sharer, const char *rb, 
uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in 
%s"
 postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
+postcopy_preempt_tls_handshake(void) ""
 postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
-- 
2.36.1

[PULL 17/29] migration: Export tls-[creds|hostname|authz] params to cmdline too

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.

The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.

The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself.  To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:

object_property_del_all(obj);
object_deinit(obj, ti);

So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.

This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185515.27475-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index cc41787079..7c7e529ca7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4366,6 +4366,9 @@ static Property migration_properties[] = {
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
 DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
   postcopy_preempt_break_huge, true),
+DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
+DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname),
+DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -4403,12 +4406,9 @@ static void migration_class_init(ObjectClass *klass, 
void *data)
 static void migration_instance_finalize(Object *obj)
 {
 MigrationState *ms = MIGRATION_OBJ(obj);
-MigrationParameters *params = >parameters;
 
 qemu_mutex_destroy(>error_mutex);
 qemu_mutex_destroy(>qemu_file_lock);
-g_free(params->tls_hostname);
-g_free(params->tls_creds);
 qemu_sem_destroy(>wait_unplug_sem);
 qemu_sem_destroy(>rate_limit_sem);
 qemu_sem_destroy(>pause_sem);
-- 
2.36.1

[PULL 19/29] migration: Respect postcopy request order in preemption mode

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.

We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole.  We can't split a huge page
into two channels.

That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests.  The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.

The worst case time it needs can be as long as the whole postcopy migration
procedure.  It's extremely unlikely to happen, but when it happens it's not
good.

The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:

  1. Whether this page request is urgent, and,
  2. Which channel we should use for this page request.

With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false.  We can never have (1)
and (2) to have different values.

However it doesn't necessarily need to be like that.  It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel.  Just like the corner case we were discussing
above.

To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only.  Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.

With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page.  So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.

Reported-by: Manish Mishra 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185520.27583-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 65 +++--
 1 file changed, 52 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7cbe9c310d..4fbad74c6c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -442,8 +442,28 @@ struct PageSearchStatus {
 unsigned long page;
 /* Set once we wrap around */
 bool complete_round;
-/* Whether current page is explicitly requested by postcopy */
+/*
+ * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+ * postcopy.  When set, the request is "urgent" because the dest QEMU
+ * threads are waiting for us.
+ */
 bool postcopy_requested;
+/*
+ * [POSTCOPY-ONLY] The target channel to use to send current page.
+ *
+ * Note: This may _not_ match with the value in postcopy_requested
+ * above. Let's imagine the case where the postcopy request is exactly
+ * the page that we're sending in progress during precopy. In this case
+ * we'll have postcopy_requested set to true but the target channel
+ * will be the precopy channel (so that we don't split brain on that
+ * specific page since the precopy channel already contains partial of
+ * that page data).
+ *
+ * Besides that specific use case, postcopy_target_channel should
+ * always be equal to postcopy_requested, because by default we send
+ * postcopy pages via postcopy preempt channel.
+ */
+bool postcopy_target_channel;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -495,6 +515,9 @@ static QemuCond decomp_done_cond;
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
  ram_addr_t offset, uint8_t *source_buf);
 
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
+ bool postcopy_requested);
+
 static void *do_data_compress(void *opaque)
 {
 CompressParam *param = opaque;

[PULL 10/29] migration: Add postcopy-preempt capability

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish.  That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <20220707185342.26794-2-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 18 ++
 migration/migration.h |  1 +
 qapi/migration.json   |  7 ++-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 78f5057373..ce7bb68cdc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1297,6 +1297,13 @@ static bool migrate_caps_check(bool *cap_list,
 return false;
 }
 
+if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
+if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+error_setg(errp, "Postcopy preempt requires postcopy-ram");
+return false;
+}
+}
+
 return true;
 }
 
@@ -2663,6 +2670,15 @@ bool migrate_background_snapshot(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
 }
 
+bool migrate_postcopy_preempt(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -4274,6 +4290,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-compress", MIGRATION_CAPABILITY_COMPRESS),
 DEFINE_PROP_MIG_CAP("x-events", MIGRATION_CAPABILITY_EVENTS),
 DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
+DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
+MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
 DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
 DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
 DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
diff --git a/migration/migration.h b/migration/migration.h
index 485d58b95f..d2269c826c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -400,6 +400,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_preempt(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index e552ee4f43..7586df3dea 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -467,6 +467,11 @@
 #  Requires that QEMU be permitted to use locked memory
 #  for guest RAM pages.
 #  (since 7.1)
+# @postcopy-preempt: If enabled, the migration process will allow postcopy
+#requests to preempt precopy stream, so postcopy requests
+#will be handled faster.  This is a performance feature and
+#should not affect the correctness of postcopy migration.
+#(since 7.1)
 #
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -482,7 +487,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send'] }
+   'zero-copy-send', 'postcopy-preempt'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.36.1

[PULL 12/29] migration: Postcopy preemption enablement

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185504.27203-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c  |   2 +
 migration/migration.h  |   2 +-
 migration/ram.c| 251 +++--
 migration/trace-events |   7 ++
 4 files changed, 253 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c965cae1d4..c5f0fdf8f8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3190,6 +3190,8 @@ static int postcopy_start(MigrationState *ms)
   MIGRATION_STATUS_FAILED);
 }
 
+trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+
 return ret;
 
 fail_closefb:
diff --git a/migration/migration.h b/migration/migration.h
index 941c61e543..ff714c235f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -68,7 +68,7 @@ typedef struct {
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
 /* Previously received RAM's RAMBlock pointer */
-RAMBlock *last_recv_block;
+RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index e4364c0bff..65b08c4edb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -296,6 +296,20 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+typedef struct {
+/*
+ * Cached ramblock/offset values if preempted.  They're only meaningful if
+ * preempted==true below.
+ */
+RAMBlock *ram_block;
+unsigned long ram_page;
+/*
+ * Whether a postcopy preemption just happened.  Will be reset after
+ * precopy recovered to background migration.
+ */
+bool preempted;
+} PostcopyPreemptState;
+
 /* State of RAM for migration */
 struct RAMState {
 /* QEMUFile used for this migration */
@@ -350,6 +364,14 @@ struct RAMState {
 /* Queue of outstanding page requests from the destination */
 QemuMutex src_page_req_mutex;
 QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+/* Postcopy preemption informations */
+PostcopyPreemptState postcopy_preempt_state;
+/*
+ * Current channel we're using on src VM.  Only valid if postcopy-preempt
+ * is enabled.
+ */
+unsigned int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -357,6 +379,11 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
+static void postcopy_preempt_reset(RAMState *rs)
+{
+memset(>postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
+}
+
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -1947,6

[PULL 02/29] cpus: Introduce cpu_list_generation_id

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index db459b41ce..793364dc0e 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5968551a05..2281be4e10 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -35,6 +35,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
2.36.1

[PULL 13/29] migration: Postcopy recover with preempt enabled

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185506.27257-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c| 27 +++
 migration/migration.h| 19 +++
 migration/postcopy-ram.c | 25 +++--
 migration/qemu-file.c| 27 +++
 migration/qemu-file.h|  1 +
 migration/savevm.c   | 26 --
 migration/trace-events   |  2 ++
 7 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c5f0fdf8f8..3119bd2e4b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
 current_incoming->postcopy_remote_fds =
 g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
 qemu_mutex_init(_incoming->rp_mutex);
+qemu_mutex_init(_incoming->postcopy_prio_thread_mutex);
 qemu_event_init(_incoming->main_thread_load_event, false);
 qemu_sem_init(_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(_incoming->postcopy_pause_sem_fault, 0);
+qemu_sem_init(_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
 /*
  * Here, we only wake up the main loading thread (while the
- * fault thread will still be waiting), so that we can receive
+ * rest threads will still be waiting), so that we can receive
  * commands from source now, and answer it if needed. The
- * fault thread will be woken up afterwards until we are sure
+ * rest threads will be woken up afterwards until we are sure
  * that source is ready to reply to page requests.
  */
 qemu_sem_post(>postcopy_pause_sem_dst);
@@ -3503,6 +3505,18 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because no racer as long as we do this before setting
+ * status to paused.
+ */
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_file_shutdown(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 migrate_set_state(>state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -3558,8 +3572,13 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 return MIG_THR_ERR_FATAL;
 }
 
-/* Try to detect any file errors */
-ret = qemu_file_get_error_obj(s->to_dst_file, _error);
+/*
+ * Try to detect any file errors.  Note that postcopy_qemufile_src will
+ * be NULL when postcopy preempt is not enabled.
+ */
+ret = qemu_file_get_error_obj_any(s->to_dst_file,
+  s->postcopy_qemufile_src,
+  _error);
 if (!ret) {
 /* Everything is fine */
 assert(!local_error);
diff --git a/migration/migration.h b/migration/migration.h
index ff714c235f..9220cec6bd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,18 @@ struct MigrationIncomingState {
 /* Postcopy priority thread is used to receive postcopy requested pages */
 QemuThread postcopy_prio_thread;
 bool postcopy_prio_thread_created;
+/*
+ * Used to sync between the ram load main thread and the fast ram load
+ * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+ * fast channel.
+ *
+ * The ram fast load thread will take it mostly for the whole lifecycle
+ * because it needs to continuously read data from the channel, and
+ * it'll only release this mutex if postcopy is interrupted, so that
+ * the ram load main thread will take this mutex over and properly
+ * release the broken channel.
+ */
+QemuMutex postcopy_prio_thread_mutex;
 /*
  * An array of temp host huge pages to be used,

[PULL 00/29] migration queue

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit da7da9d5e608200ecc0749ff37be246e9cd3314f:

  Merge tag 'pull-request-2022-07-19' of https://gitlab.com/thuth/qemu into 
staging (2022-07-19 13:05:06 +0100)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220719c

for you to fetch changes up to ec0345c1000b3a57b557da4c2e3f2114dd23903a:

  migration: Avoid false-positive on non-supported scenarios for zero-copy-send 
(2022-07-19 17:33:22 +0100)


Migration pull 2022-07-19

  Hyman's dirty page rate limit set
  Ilya's fix for zlib vs migration
  Peter's postcopy-preempt
  Cleanup from Dan
  zero-copy tidy ups from Leo
  multifd doc fix from Juan

Signed-off-by: Dr. David Alan Gilbert 


Daniel P. Berrangé (1):
  migration: remove unreachable code after reading data

Hyman Huang (8):
  accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
  cpus: Introduce cpu_list_generation_id
  migration/dirtyrate: Refactor dirty page rate calculation
  softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically
  accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function
  softmmu/dirtylimit: Implement virtual CPU throttle
  softmmu/dirtylimit: Implement dirty page rate limit
  tests: Add dirty page rate limit test

Ilya Leoshkevich (1):
  multifd: Copy pages before compressing them with zlib

Juan Quintela (1):
  multifd: Document the locking of MultiFD{Send/Recv}Params

Leonardo Bras (4):
  QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  Add dirty-sync-missed-zero-copy migration stat
  migration/multifd: Report to user when zerocopy not working
  migration: Avoid false-positive on non-supported scenarios for 
zero-copy-send

Peter Xu (14):
  migration: Add postcopy-preempt capability
  migration: Postcopy preemption preparation on channel creation
  migration: Postcopy preemption enablement
  migration: Postcopy recover with preempt enabled
  migration: Create the postcopy preempt channel asynchronously
  migration: Add property x-postcopy-preempt-break-huge
  migration: Add helpers to detect TLS capability
  migration: Export tls-[creds|hostname|authz] params to cmdline too
  migration: Enable TLS for preempt channel
  migration: Respect postcopy request order in preemption mode
  tests: Move MigrateCommon upper
  tests: Add postcopy tls migration test
  tests: Add postcopy tls recovery migration test
  tests: Add postcopy preempt tests

 accel/kvm/kvm-all.c |  46 ++-
 accel/stubs/kvm-stub.c  |   5 +
 cpus-common.c   |   8 +
 hmp-commands-info.hx|  13 +
 hmp-commands.hx |  32 +++
 include/exec/cpu-common.h   |   1 +
 include/exec/memory.h   |   5 +-
 include/hw/core/cpu.h   |   6 +
 include/monitor/hmp.h   |   3 +
 include/sysemu/dirtylimit.h |  37 +++
 include/sysemu/dirtyrate.h  |  28 ++
 include/sysemu/kvm.h|   2 +
 io/channel-socket.c |   8 +-
 migration/channel.c |   9 +-
 migration/dirtyrate.c   | 227 +--
 migration/dirtyrate.h   |   7 +-
 migration/migration.c   | 152 --
 migration/migration.h   |  44 ++-
 migration/multifd-zlib.c|  38 ++-
 migration/multifd.c |   6 +-
 migration/multifd.h |  66 +++--
 migration/postcopy-ram.c| 186 -
 migration/postcopy-ram.h|  11 +
 migration/qemu-file.c   |  31 ++-
 migration/qemu-file.h   |   1 +
 migration/ram.c | 331 --
 migration/ram.h |   6 +-
 migration/savevm.c  |  46 ++-
 migration/socket.c  |  22 +-
 migration/socket.h  |   1 +
 migration/tls.c |   9 +
 migration/tls.h |   4 +
 migration/trace-events  |  15 +-
 monitor/hmp-cmds.c  |   5 +
 qapi/migration.json |  94 ++-
 softmmu/dirtylimit.c| 601 
 softmmu/meson.build |   1 +
 softmmu/trace-events|   7 +
 tests/qtest/migration-helpers.c |  22 ++
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 539 +--
 tests/qtest/qmp-cmd-test.c  |   2 +
 42 files changed, 2394 insertions(+), 285 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 include/sysemu/dirtyrate.h
 create mode 100644 softmmu/dirtylimit.c

[PULL 01/29] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ed8b6b896e..ce989a68ff 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -757,17 +757,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -788,7 +791,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -808,7 +811,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -855,7 +858,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1399,7 +1402,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1471,7 +1474,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2967,7 +2970,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
2.36.1

[PULL 11/29] migration: Postcopy preemption preparation on channel creation

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <20220707185502.27149-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: With Peter's fix to quieten compiler warning on
   start_migration
---
 migration/migration.c| 63 +++
 migration/migration.h|  8 
 migration/postcopy-ram.c | 92 ++--
 migration/postcopy-ram.h | 10 +
 migration/ram.c  | 25 ---
 migration/ram.h  |  4 +-
 migration/savevm.c   | 20 -
 migration/socket.c   | 22 +-
 migration/socket.h   |  1 +
 migration/trace-events   |  5 ++-
 10 files changed, 219 insertions(+), 31 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ce7bb68cdc..c965cae1d4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
 mis->page_requested = NULL;
 }
 
+if (mis->postcopy_qemufile_dst) {
+migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+qemu_fclose(mis->postcopy_qemufile_dst);
+mis->postcopy_qemufile_dst = NULL;
+}
+
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error 
**errp)
 migration_incoming_process();
 }
 
+static bool migration_needs_multiple_sockets(void)
+{
+return migrate_use_multifd() || migrate_postcopy_preempt();
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 bool start_migration;
+QEMUFile *f;
 
 if (!mis->from_src_file) {
 /* The first connection (multifd may have multiple) */
-QEMUFile *f = qemu_file_new_input(ioc);
+f = qemu_file_new_input(ioc);
 
 if (!migration_incoming_setup(f, errp)) {
 return;
@@ -730,13 +742,19 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 
 /*
  * Common migration only needs one channel, so we can start
- * right now.  Multifd needs more than one channel, we wait.
+ * right now.  Some features need more than one channel, we wait.
  */
-start_migration = !migrate_use_multifd();
+start_migration = !migration_needs_multiple_sockets();
 } else {
 /* Multiple connections */
-assert(migrate_use_multifd());
-start_migration = multifd_recv_new_channel(ioc, _err);
+assert(migration_needs_multiple_sockets());
+if (migrate_use_multifd()) {
+start_migration = multifd_recv_new_channel(ioc, _err);
+} else {
+assert(migrate_postcopy_preempt());
+f = qemu_file_new_input(ioc);
+start_migration = postcopy_preempt_new_channel(mis, f);
+}
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -761,11 +779,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 bool migration_has_all_channels(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
-bool all_channels;
 
-all_channels = multifd_recv_all_channels_created();
+if (!mis->from_src_file) {
+return false;
+}
+
+if (migrate_use_multifd()) {
+return multifd_recv_all_channels_created();
+}
+
+if (migrate_postcopy_preempt()) {
+return mis->postcopy_qemufile_dst != NULL;
+}
 
-return all_channels && mis->from_src_file != NULL;
+return true;
 }
 
 /*
@@ -1874,6 +1901,12 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -3269,6 +3302,11 @@ static void migration_completion(MigrationState *s)
 qemu_savevm_state_complete_postcopy(s->to_dst_file);
 qemu_mutex_unlock_iothread();
 
+/* Shutdown the postcopy fast path thread */
+if (migrate_postcopy_preempt()) {
+postcopy_preempt_shutdown_file(s);
+}
+

[PULL 09/29] multifd: Copy pages before compressing them with zlib

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Ilya Leoshkevich 

zlib_send_prepare() compresses pages of a running VM. zlib does not
make any thread-safety guarantees with respect to changing deflate()
input concurrently with deflate() [1].

One can observe problems due to this with the IBM zEnterprise Data
Compression accelerator capable zlib [2]. When the hardware
acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
intermittently [3] due to sliding window corruption. The accelerator's
architecture explicitly discourages concurrent accesses [4]:

Page 26-57, "Other Conditions":

As observed by this CPU, other CPUs, and channel
programs, references to the parameter block, first,
second, and third operands may be multiple-access
references, accesses to these storage locations are
not necessarily block-concurrent, and the sequence
of these accesses or references is undefined.

Mark Adler pointed out that vanilla zlib performs double fetches under
certain circumstances as well [5], therefore we need to copy data
before passing it to deflate().

[1] https://zlib.net/manual.html
[2] https://github.com/madler/zlib/pull/410
[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
[4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
[5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20220705203559.2960949-1-...@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/multifd-zlib.c | 38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 3a7ae44485..18213a9513 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -27,6 +27,8 @@ struct zlib_data {
 uint8_t *zbuff;
 /* size of compressed buffer */
 uint32_t zbuff_len;
+/* uncompressed buffer of size qemu_target_page_size() */
+uint8_t *buf;
 };
 
 /* Multifd zlib compression */
@@ -45,26 +47,38 @@ static int zlib_send_setup(MultiFDSendParams *p, Error 
**errp)
 {
 struct zlib_data *z = g_new0(struct zlib_data, 1);
 z_stream *zs = >zs;
+const char *err_msg;
 
 zs->zalloc = Z_NULL;
 zs->zfree = Z_NULL;
 zs->opaque = Z_NULL;
 if (deflateInit(zs, migrate_multifd_zlib_level()) != Z_OK) {
-g_free(z);
-error_setg(errp, "multifd %u: deflate init failed", p->id);
-return -1;
+err_msg = "deflate init failed";
+goto err_free_z;
 }
 /* This is the maxium size of the compressed buffer */
 z->zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
-deflateEnd(>zs);
-g_free(z);
-error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
-return -1;
+err_msg = "out of memory for zbuff";
+goto err_deflate_end;
+}
+z->buf = g_try_malloc(qemu_target_page_size());
+if (!z->buf) {
+err_msg = "out of memory for buf";
+goto err_free_zbuff;
 }
 p->data = z;
 return 0;
+
+err_free_zbuff:
+g_free(z->zbuff);
+err_deflate_end:
+deflateEnd(>zs);
+err_free_z:
+g_free(z);
+error_setg(errp, "multifd %u: %s", p->id, err_msg);
+return -1;
 }
 
 /**
@@ -82,6 +96,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 deflateEnd(>zs);
 g_free(z->zbuff);
 z->zbuff = NULL;
+g_free(z->buf);
+z->buf = NULL;
 g_free(p->data);
 p->data = NULL;
 }
@@ -114,8 +130,14 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 flush = Z_SYNC_FLUSH;
 }
 
+/*
+ * Since the VM might be running, the page may be changing concurrently
+ * with compression. zlib does not guarantee that this is safe,
+ * therefore copy the page before calling deflate().
+ */
+memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
 zs->avail_in = page_size;
-zs->next_in = p->pages->block->host + p->normal[i];
+zs->next_in = z->buf;
 
 zs->avail_out = available;
 zs->next_out = z->zbuff + out_size;
-- 
2.36.1

[PULL 23/25] migration: remove the QEMUFileOps 'writev_buffer' callback

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

This directly implements the writev_buffer logic using QIOChannel APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file-channel.c | 43 ---
 migration/qemu-file.c | 24 +++
 migration/qemu-file.h |  9 
 3 files changed, 8 insertions(+), 68 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 7b32831752..2e139f7bcd 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -32,48 +32,6 @@
 #include "yank_functions.h"
 
 
-static ssize_t channel_writev_buffer(void *opaque,
- struct iovec *iov,
- int iovcnt,
- int64_t pos,
- Error **errp)
-{
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-ssize_t done = 0;
-struct iovec *local_iov = g_new(struct iovec, iovcnt);
-struct iovec *local_iov_head = local_iov;
-unsigned int nlocal_iov = iovcnt;
-
-nlocal_iov = iov_copy(local_iov, nlocal_iov,
-  iov, iovcnt,
-  0, iov_size(iov, iovcnt));
-
-while (nlocal_iov > 0) {
-ssize_t len;
-len = qio_channel_writev(ioc, local_iov, nlocal_iov, errp);
-if (len == QIO_CHANNEL_ERR_BLOCK) {
-if (qemu_in_coroutine()) {
-qio_channel_yield(ioc, G_IO_OUT);
-} else {
-qio_channel_wait(ioc, G_IO_OUT);
-}
-continue;
-}
-if (len < 0) {
-done = -EIO;
-goto cleanup;
-}
-
-iov_discard_front(_iov, _iov, len);
-done += len;
-}
-
- cleanup:
-g_free(local_iov_head);
-return done;
-}
-
-
 static QEMUFile *channel_get_input_return_path(void *opaque)
 {
 QIOChannel *ioc = QIO_CHANNEL(opaque);
@@ -94,7 +52,6 @@ static const QEMUFileOps channel_input_ops = {
 
 
 static const QEMUFileOps channel_output_ops = {
-.writev_buffer = channel_writev_buffer,
 .get_return_path = channel_get_output_return_path,
 };
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2f46873efd..355117fee0 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -248,10 +248,6 @@ static void qemu_iovec_release_ram(QEMUFile *f)
  */
 void qemu_fflush(QEMUFile *f)
 {
-ssize_t ret = 0;
-ssize_t expect = 0;
-Error *local_error = NULL;
-
 if (!qemu_file_is_writable(f)) {
 return;
 }
@@ -260,22 +256,18 @@ void qemu_fflush(QEMUFile *f)
 return;
 }
 if (f->iovcnt > 0) {
-expect = iov_size(f->iov, f->iovcnt);
-ret = f->ops->writev_buffer(f->ioc, f->iov, f->iovcnt,
-f->total_transferred, _error);
+Error *local_error = NULL;
+if (qio_channel_writev_all(f->ioc,
+   f->iov, f->iovcnt,
+   _error) < 0) {
+qemu_file_set_error_obj(f, -EIO, local_error);
+} else {
+f->total_transferred += iov_size(f->iov, f->iovcnt);
+}
 
 qemu_iovec_release_ram(f);
 }
 
-if (ret >= 0) {
-f->total_transferred += ret;
-}
-/* We expect the QEMUFile write impl to send the full
- * data set we requested, so sanity check that.
- */
-if (ret != expect) {
-qemu_file_set_error_obj(f, ret < 0 ? ret : -EIO, local_error);
-}
 f->buf_index = 0;
 f->iovcnt = 0;
 }
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index f7ed568894..de3f066014 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -29,14 +29,6 @@
 #include "exec/cpu-common.h"
 #include "io/channel.h"
 
-/*
- * This function writes an iovec to file. The handler must write all
- * of the data or return a negative errno value.
- */
-typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
-   int iovcnt, int64_t pos,
-   Error **errp);
-
 /*
  * This function provides hooks around different
  * stages of RAM migration.
@@ -69,7 +61,6 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f,
 typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
 
 typedef struct QEMUFileOps {
-QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
 
-- 
2.36.1

[PULL 21/25] migration: remove the QEMUFileOps 'close' callback

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

This directly implements the close logic using QIOChannel APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file-channel.c | 12 
 migration/qemu-file.c | 12 ++--
 migration/qemu-file.h | 10 --
 3 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 0350d367ec..8ff58e81f9 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -102,16 +102,6 @@ static ssize_t channel_get_buffer(void *opaque,
 }
 
 
-static int channel_close(void *opaque, Error **errp)
-{
-int ret;
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-ret = qio_channel_close(ioc, errp);
-object_unref(OBJECT(ioc));
-return ret;
-}
-
-
 static QEMUFile *channel_get_input_return_path(void *opaque)
 {
 QIOChannel *ioc = QIO_CHANNEL(opaque);
@@ -128,14 +118,12 @@ static QEMUFile *channel_get_output_return_path(void 
*opaque)
 
 static const QEMUFileOps channel_input_ops = {
 .get_buffer = channel_get_buffer,
-.close = channel_close,
 .get_return_path = channel_get_input_return_path,
 };
 
 
 static const QEMUFileOps channel_output_ops = {
 .writev_buffer = channel_writev_buffer,
-.close = channel_close,
 .get_return_path = channel_get_output_return_path,
 };
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 95d5db9dd6..74f919de67 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -408,16 +408,16 @@ void qemu_file_credit_transfer(QEMUFile *f, size_t size)
  */
 int qemu_fclose(QEMUFile *f)
 {
-int ret;
+int ret, ret2;
 qemu_fflush(f);
 ret = qemu_file_get_error(f);
 
-if (f->ops->close) {
-int ret2 = f->ops->close(f->ioc, NULL);
-if (ret >= 0) {
-ret = ret2;
-}
+ret2 = qio_channel_close(f->ioc, NULL);
+if (ret >= 0) {
+ret = ret2;
 }
+g_clear_pointer(>ioc, object_unref);
+
 /* If any error was spotted before closing, we should report it
  * instead of the close() return value.
  */
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 7793e765f2..4a3beedb5b 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -37,15 +37,6 @@ typedef ssize_t (QEMUFileGetBufferFunc)(void *opaque, 
uint8_t *buf,
 int64_t pos, size_t size,
 Error **errp);
 
-/* Close a file
- *
- * Return negative error number on error, 0 or positive value on success.
- *
- * The meaning of return value on success depends on the specific back-end 
being
- * used.
- */
-typedef int (QEMUFileCloseFunc)(void *opaque, Error **errp);
-
 /*
  * This function writes an iovec to file. The handler must write all
  * of the data or return a negative errno value.
@@ -87,7 +78,6 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
 
 typedef struct QEMUFileOps {
 QEMUFileGetBufferFunc *get_buffer;
-QEMUFileCloseFunc *close;
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
-- 
2.36.1

[PULL 25/25] migration: remove the QEMUFileOps abstraction

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

Now that all QEMUFile callbacks are removed, the entire concept can be
deleted.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/channel.c   |  4 +--
 migration/colo.c  |  5 ++--
 migration/meson.build |  1 -
 migration/migration.c |  7 ++---
 migration/qemu-file-channel.c | 53 ---
 migration/qemu-file-channel.h | 32 -
 migration/qemu-file.c | 20 ++---
 migration/qemu-file.h |  7 ++---
 migration/ram.c   |  3 +-
 migration/rdma.c  |  5 ++--
 migration/savevm.c| 13 -
 tests/unit/test-vmstate.c |  5 ++--
 12 files changed, 27 insertions(+), 128 deletions(-)
 delete mode 100644 migration/qemu-file-channel.c
 delete mode 100644 migration/qemu-file-channel.h

diff --git a/migration/channel.c b/migration/channel.c
index a162d00fea..90087d8986 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -14,7 +14,7 @@
 #include "channel.h"
 #include "tls.h"
 #include "migration.h"
-#include "qemu-file-channel.h"
+#include "qemu-file.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
@@ -85,7 +85,7 @@ void migration_channel_connect(MigrationState *s,
 return;
 }
 } else {
-QEMUFile *f = qemu_fopen_channel_output(ioc);
+QEMUFile *f = qemu_file_new_output(ioc);
 
 migration_ioc_register_yank(ioc);
 
diff --git a/migration/colo.c b/migration/colo.c
index 5f7071b3cd..2b71722fd6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -14,7 +14,6 @@
 #include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
-#include "qemu-file-channel.h"
 #include "migration.h"
 #include "qemu-file.h"
 #include "savevm.h"
@@ -559,7 +558,7 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
-fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc));
+fb = qemu_file_new_output(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
@@ -873,7 +872,7 @@ void *colo_process_incoming_thread(void *opaque)
 colo_incoming_start_dirty_log();
 
 bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
-fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
+fb = qemu_file_new_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
diff --git a/migration/meson.build b/migration/meson.build
index 8d309f5849..690487cf1a 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -4,7 +4,6 @@ migration_files = files(
   'xbzrle.c',
   'vmstate-types.c',
   'vmstate.c',
-  'qemu-file-channel.c',
   'qemu-file.c',
   'yank_functions.c',
 )
diff --git a/migration/migration.c b/migration/migration.c
index 6d56eb1617..78f5057373 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -30,7 +30,6 @@
 #include "migration/misc.h"
 #include "migration.h"
 #include "savevm.h"
-#include "qemu-file-channel.h"
 #include "qemu-file.h"
 #include "migration/vmstate.h"
 #include "block/block.h"
@@ -723,7 +722,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error 
**errp)
 
 if (!mis->from_src_file) {
 /* The first connection (multifd may have multiple) */
-QEMUFile *f = qemu_fopen_channel_input(ioc);
+QEMUFile *f = qemu_file_new_input(ioc);
 
 if (!migration_incoming_setup(f, errp)) {
 return;
@@ -3076,7 +3075,7 @@ static int postcopy_start(MigrationState *ms)
  */
 bioc = qio_channel_buffer_new(4096);
 qio_channel_set_name(QIO_CHANNEL(bioc), "migration-postcopy-buffer");
-fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc));
+fb = qemu_file_new_output(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
 /*
@@ -3966,7 +3965,7 @@ static void *bg_migration_thread(void *opaque)
  */
 s->bioc = qio_channel_buffer_new(512 * 1024);
 qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
-fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
+fb = qemu_file_new_output(QIO_CHANNEL(s->bioc));
 object_unref(OBJECT(s->bioc));
 
 update_iteration_initial_status(s);
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
deleted file mode 100644
index 51717c1137..00
--- a/migration/qemu-file-channel.c
+++ /dev/null
@@ -1,53 +0,0 @@
-/*
- * QEMUFile backend for QIOChannel objects
- *
- * Copyright (c) 2015-2016 Red Hat, Inc
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge,

[PULL 22/25] migration: remove the QEMUFileOps 'get_buffer' callback

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

This directly implements the get_buffer logic using QIOChannel APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Fixup len = *-*EIO as spotted by Peter Xu
---
 migration/qemu-file-channel.c | 29 -
 migration/qemu-file.c | 18 --
 migration/qemu-file.h |  9 -
 3 files changed, 16 insertions(+), 40 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 8ff58e81f9..7b32831752 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -74,34 +74,6 @@ static ssize_t channel_writev_buffer(void *opaque,
 }
 
 
-static ssize_t channel_get_buffer(void *opaque,
-  uint8_t *buf,
-  int64_t pos,
-  size_t size,
-  Error **errp)
-{
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-ssize_t ret;
-
-do {
-ret = qio_channel_read(ioc, (char *)buf, size, errp);
-if (ret < 0) {
-if (ret == QIO_CHANNEL_ERR_BLOCK) {
-if (qemu_in_coroutine()) {
-qio_channel_yield(ioc, G_IO_IN);
-} else {
-qio_channel_wait(ioc, G_IO_IN);
-}
-} else {
-return -EIO;
-}
-}
-} while (ret == QIO_CHANNEL_ERR_BLOCK);
-
-return ret;
-}
-
-
 static QEMUFile *channel_get_input_return_path(void *opaque)
 {
 QIOChannel *ioc = QIO_CHANNEL(opaque);
@@ -117,7 +89,6 @@ static QEMUFile *channel_get_output_return_path(void *opaque)
 }
 
 static const QEMUFileOps channel_input_ops = {
-.get_buffer = channel_get_buffer,
 .get_return_path = channel_get_input_return_path,
 };
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 74f919de67..2f46873efd 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -377,8 +377,22 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 return 0;
 }
 
-len = f->ops->get_buffer(f->ioc, f->buf + pending, f->total_transferred,
- IO_BUF_SIZE - pending, _error);
+do {
+len = qio_channel_read(f->ioc,
+   (char *)f->buf + pending,
+   IO_BUF_SIZE - pending,
+   _error);
+if (len == QIO_CHANNEL_ERR_BLOCK) {
+if (qemu_in_coroutine()) {
+qio_channel_yield(f->ioc, G_IO_IN);
+} else {
+qio_channel_wait(f->ioc, G_IO_IN);
+}
+} else if (len < 0) {
+len = -EIO;
+}
+} while (len == QIO_CHANNEL_ERR_BLOCK);
+
 if (len > 0) {
 f->buf_size += len;
 f->total_transferred += len;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 4a3beedb5b..f7ed568894 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -29,14 +29,6 @@
 #include "exec/cpu-common.h"
 #include "io/channel.h"
 
-/* Read a chunk of data from a file at the given position.  The pos argument
- * can be ignored if the file is only be used for streaming.  The number of
- * bytes actually read should be returned.
- */
-typedef ssize_t (QEMUFileGetBufferFunc)(void *opaque, uint8_t *buf,
-int64_t pos, size_t size,
-Error **errp);
-
 /*
  * This function writes an iovec to file. The handler must write all
  * of the data or return a negative errno value.
@@ -77,7 +69,6 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f,
 typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
 
 typedef struct QEMUFileOps {
-QEMUFileGetBufferFunc *get_buffer;
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
-- 
2.36.1

[PULL 20/25] migration: remove the QEMUFileOps 'set_blocking' callback

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

This directly implements the set_blocking logic using QIOChannel APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file-channel.c | 14 --
 migration/qemu-file.c |  4 +---
 migration/qemu-file.h |  5 -
 3 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 80f05dc371..0350d367ec 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -112,18 +112,6 @@ static int channel_close(void *opaque, Error **errp)
 }
 
 
-static int channel_set_blocking(void *opaque,
-bool enabled,
-Error **errp)
-{
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-
-if (qio_channel_set_blocking(ioc, enabled, errp) < 0) {
-return -1;
-}
-return 0;
-}
-
 static QEMUFile *channel_get_input_return_path(void *opaque)
 {
 QIOChannel *ioc = QIO_CHANNEL(opaque);
@@ -141,7 +129,6 @@ static QEMUFile *channel_get_output_return_path(void 
*opaque)
 static const QEMUFileOps channel_input_ops = {
 .get_buffer = channel_get_buffer,
 .close = channel_close,
-.set_blocking = channel_set_blocking,
 .get_return_path = channel_get_input_return_path,
 };
 
@@ -149,7 +136,6 @@ static const QEMUFileOps channel_input_ops = {
 static const QEMUFileOps channel_output_ops = {
 .writev_buffer = channel_writev_buffer,
 .close = channel_close,
-.set_blocking = channel_set_blocking,
 .get_return_path = channel_get_output_return_path,
 };
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d71bcb6c9c..95d5db9dd6 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -879,9 +879,7 @@ void qemu_put_counted_string(QEMUFile *f, const char *str)
  */
 void qemu_file_set_blocking(QEMUFile *f, bool block)
 {
-if (f->ops->set_blocking) {
-f->ops->set_blocking(f->ioc, block, NULL);
-}
+qio_channel_set_blocking(f->ioc, block, NULL);
 }
 
 /*
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 9fa92c1998..7793e765f2 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -46,10 +46,6 @@ typedef ssize_t (QEMUFileGetBufferFunc)(void *opaque, 
uint8_t *buf,
  */
 typedef int (QEMUFileCloseFunc)(void *opaque, Error **errp);
 
-/* Called to change the blocking mode of the file
- */
-typedef int (QEMUFileSetBlocking)(void *opaque, bool enabled, Error **errp);
-
 /*
  * This function writes an iovec to file. The handler must write all
  * of the data or return a negative errno value.
@@ -92,7 +88,6 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
 typedef struct QEMUFileOps {
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
-QEMUFileSetBlocking *set_blocking;
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
-- 
2.36.1

[PULL 19/25] migration: remove the QEMUFileOps 'shut_down' callback

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

This directly implements the shutdown logic using QIOChannel APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file-channel.c | 27 ---
 migration/qemu-file.c | 13 ++---
 migration/qemu-file.h | 10 --
 3 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 5cb8ac93c0..80f05dc371 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -112,31 +112,6 @@ static int channel_close(void *opaque, Error **errp)
 }
 
 
-static int channel_shutdown(void *opaque,
-bool rd,
-bool wr,
-Error **errp)
-{
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-
-if (qio_channel_has_feature(ioc,
-QIO_CHANNEL_FEATURE_SHUTDOWN)) {
-QIOChannelShutdown mode;
-if (rd && wr) {
-mode = QIO_CHANNEL_SHUTDOWN_BOTH;
-} else if (rd) {
-mode = QIO_CHANNEL_SHUTDOWN_READ;
-} else {
-mode = QIO_CHANNEL_SHUTDOWN_WRITE;
-}
-if (qio_channel_shutdown(ioc, mode, errp) < 0) {
-return -EIO;
-}
-}
-return 0;
-}
-
-
 static int channel_set_blocking(void *opaque,
 bool enabled,
 Error **errp)
@@ -166,7 +141,6 @@ static QEMUFile *channel_get_output_return_path(void 
*opaque)
 static const QEMUFileOps channel_input_ops = {
 .get_buffer = channel_get_buffer,
 .close = channel_close,
-.shut_down = channel_shutdown,
 .set_blocking = channel_set_blocking,
 .get_return_path = channel_get_input_return_path,
 };
@@ -175,7 +149,6 @@ static const QEMUFileOps channel_input_ops = {
 static const QEMUFileOps channel_output_ops = {
 .writev_buffer = channel_writev_buffer,
 .close = channel_close,
-.shut_down = channel_shutdown,
 .set_blocking = channel_set_blocking,
 .get_return_path = channel_get_output_return_path,
 };
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2d6ceb53af..d71bcb6c9c 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -71,16 +71,23 @@ struct QEMUFile {
 /*
  * Stop a file from being read/written - not all backing files can do this
  * typically only sockets can.
+ *
+ * TODO: convert to propagate Error objects instead of squashing
+ * to a fixed errno value
  */
 int qemu_file_shutdown(QEMUFile *f)
 {
-int ret;
+int ret = 0;
 
 f->shutdown = true;
-if (!f->ops->shut_down) {
+if (!qio_channel_has_feature(f->ioc,
+ QIO_CHANNEL_FEATURE_SHUTDOWN)) {
 return -ENOSYS;
 }
-ret = f->ops->shut_down(f->ioc, true, true, NULL);
+
+if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL) < 0) {
+ret = -EIO;
+}
 
 if (!f->last_error) {
 qemu_file_set_error(f, -EIO);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fe1b2d1c00..9fa92c1998 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -89,22 +89,12 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f,
  */
 typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
 
-/*
- * Stop any read or write (depending on flags) on the underlying
- * transport on the QEMUFile.
- * Existing blocking reads/writes must be woken
- * Returns 0 on success, -err on error
- */
-typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr,
-   Error **errp);
-
 typedef struct QEMUFileOps {
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
 QEMUFileSetBlocking *set_blocking;
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
-QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
 typedef struct QEMUFileHooks {
-- 
2.36.1

[PULL 24/25] migration: remove the QEMUFileOps 'get_return_path' callback

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

This directly implements the get_return_path logic using QIOChannel APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file-channel.c | 16 
 migration/qemu-file.c | 22 ++
 migration/qemu-file.h |  6 --
 3 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 2e139f7bcd..51717c1137 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -32,27 +32,11 @@
 #include "yank_functions.h"
 
 
-static QEMUFile *channel_get_input_return_path(void *opaque)
-{
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-
-return qemu_fopen_channel_output(ioc);
-}
-
-static QEMUFile *channel_get_output_return_path(void *opaque)
-{
-QIOChannel *ioc = QIO_CHANNEL(opaque);
-
-return qemu_fopen_channel_input(ioc);
-}
-
 static const QEMUFileOps channel_input_ops = {
-.get_return_path = channel_get_input_return_path,
 };
 
 
 static const QEMUFileOps channel_output_ops = {
-.get_return_path = channel_get_output_return_path,
 };
 
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 355117fee0..fad0e33164 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -95,18 +95,6 @@ int qemu_file_shutdown(QEMUFile *f)
 return ret;
 }
 
-/*
- * Result: QEMUFile* for a 'return path' for comms in the opposite direction
- * NULL if not available
- */
-QEMUFile *qemu_file_get_return_path(QEMUFile *f)
-{
-if (!f->ops->get_return_path) {
-return NULL;
-}
-return f->ops->get_return_path(f->ioc);
-}
-
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
 if (mode == NULL ||
@@ -134,6 +122,16 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc,
 return f;
 }
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ * NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+object_ref(f->ioc);
+return qemu_file_new_impl(f->ioc, f->ops, !f->is_writable);
+}
+
 QEMUFile *qemu_file_new_output(QIOChannel *ioc, const QEMUFileOps *ops)
 {
 return qemu_file_new_impl(ioc, ops, true);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index de3f066014..fe8f9766d1 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -55,13 +55,7 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f,
  size_t size,
  uint64_t *bytes_sent);
 
-/*
- * Return a QEMUFile for comms in the opposite direction
- */
-typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
-
 typedef struct QEMUFileOps {
-QEMURetPathFunc *get_return_path;
 } QEMUFileOps;
 
 typedef struct QEMUFileHooks {
-- 
2.36.1

[PULL 11/25] migration: rename qemu_update_position to qemu_file_credit_transfer

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

The qemu_update_position method name gives the misleading impression
that it is changing the current file offset. Most of the files are
just streams, however, so there's no concept of a file offset in the
general case.

What this method is actually used for is to report on the number of
bytes that have been transferred out of band from the main I/O methods.
This new name better reflects this purpose.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file.c | 4 ++--
 migration/qemu-file.h | 9 -
 migration/ram.c   | 2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 7ee9b5bf05..f73b010d39 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -319,7 +319,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
block_offset,
 if (ret != RAM_SAVE_CONTROL_DELAYED &&
 ret != RAM_SAVE_CONTROL_NOT_SUPP) {
 if (bytes_sent && *bytes_sent > 0) {
-qemu_update_position(f, *bytes_sent);
+qemu_file_credit_transfer(f, *bytes_sent);
 } else if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -374,7 +374,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 return len;
 }
 
-void qemu_update_position(QEMUFile *f, size_t size)
+void qemu_file_credit_transfer(QEMUFile *f, size_t size)
 {
 f->total_transferred += size;
 }
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 05f6aef903..d96f5f7118 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -179,7 +179,14 @@ int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
  */
 int qemu_peek_byte(QEMUFile *f, int offset);
 void qemu_file_skip(QEMUFile *f, int size);
-void qemu_update_position(QEMUFile *f, size_t size);
+/*
+ * qemu_file_credit_transfer:
+ *
+ * Report on a number of bytes that have been transferred
+ * out of band from the main file object I/O methods. This
+ * accounting information tracks the total migration traffic.
+ */
+void qemu_file_credit_transfer(QEMUFile *f, size_t size);
 void qemu_file_reset_rate_limit(QEMUFile *f);
 void qemu_file_update_transfer(QEMUFile *f, int64_t len);
 void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
diff --git a/migration/ram.c b/migration/ram.c
index 89082716d6..bf321e1e72 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2301,7 +2301,7 @@ void acct_update_position(QEMUFile *f, size_t size, bool 
zero)
 } else {
 ram_counters.normal += pages;
 ram_transferred_add(size);
-qemu_update_position(f, size);
+qemu_file_credit_transfer(f, size);
 }
 }
 
-- 
2.36.1

[PULL 16/25] migration: hardcode assumption that QEMUFile is backed with QIOChannel

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

The only callers of qemu_fopen_ops pass 'true' for the 'has_ioc'
parameter, so hardcode this assumption in QEMUFile, by passing in
the QIOChannel object as a non-opaque parameter.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
   dgilbert: Fixed long line
---
 migration/qemu-file-channel.c |  4 ++--
 migration/qemu-file.c | 35 +--
 migration/qemu-file.h |  2 +-
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index bb5a5752df..ce8eced417 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -184,11 +184,11 @@ static const QEMUFileOps channel_output_ops = {
 QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc)
 {
 object_ref(OBJECT(ioc));
-return qemu_fopen_ops(ioc, _input_ops, true);
+return qemu_fopen_ops(ioc, _input_ops);
 }
 
 QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc)
 {
 object_ref(OBJECT(ioc));
-return qemu_fopen_ops(ioc, _output_ops, true);
+return qemu_fopen_ops(ioc, _output_ops);
 }
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index cdcb6e1788..30e2160041 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,7 +37,7 @@
 struct QEMUFile {
 const QEMUFileOps *ops;
 const QEMUFileHooks *hooks;
-void *opaque;
+QIOChannel *ioc;
 
 /*
  * Maximum amount of data in bytes to transfer during one
@@ -65,8 +65,6 @@ struct QEMUFile {
 Error *last_error_obj;
 /* has the file has been shutdown */
 bool shutdown;
-/* Whether opaque points to a QIOChannel */
-bool has_ioc;
 };
 
 /*
@@ -81,7 +79,7 @@ int qemu_file_shutdown(QEMUFile *f)
 if (!f->ops->shut_down) {
 return -ENOSYS;
 }
-ret = f->ops->shut_down(f->opaque, true, true, NULL);
+ret = f->ops->shut_down(f->ioc, true, true, NULL);
 
 if (!f->last_error) {
 qemu_file_set_error(f, -EIO);
@@ -98,7 +96,7 @@ QEMUFile *qemu_file_get_return_path(QEMUFile *f)
 if (!f->ops->get_return_path) {
 return NULL;
 }
-return f->ops->get_return_path(f->opaque);
+return f->ops->get_return_path(f->ioc);
 }
 
 bool qemu_file_mode_is_not_valid(const char *mode)
@@ -113,15 +111,15 @@ bool qemu_file_mode_is_not_valid(const char *mode)
 return false;
 }
 
-QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc)
+QEMUFile *qemu_fopen_ops(QIOChannel *ioc, const QEMUFileOps *ops)
 {
 QEMUFile *f;
 
 f = g_new0(QEMUFile, 1);
 
-f->opaque = opaque;
+f->ioc = ioc;
 f->ops = ops;
-f->has_ioc = has_ioc;
+
 return f;
 }
 
@@ -242,7 +240,7 @@ void qemu_fflush(QEMUFile *f)
 }
 if (f->iovcnt > 0) {
 expect = iov_size(f->iov, f->iovcnt);
-ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt,
+ret = f->ops->writev_buffer(f->ioc, f->iov, f->iovcnt,
 f->total_transferred, _error);
 
 qemu_iovec_release_ram(f);
@@ -358,7 +356,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 return 0;
 }
 
-len = f->ops->get_buffer(f->opaque, f->buf + pending, f->total_transferred,
+len = f->ops->get_buffer(f->ioc, f->buf + pending, f->total_transferred,
  IO_BUF_SIZE - pending, _error);
 if (len > 0) {
 f->buf_size += len;
@@ -394,7 +392,7 @@ int qemu_fclose(QEMUFile *f)
 ret = qemu_file_get_error(f);
 
 if (f->ops->close) {
-int ret2 = f->ops->close(f->opaque, NULL);
+int ret2 = f->ops->close(f->ioc, NULL);
 if (ret >= 0) {
 ret = ret2;
 }
@@ -861,18 +859,19 @@ void qemu_put_counted_string(QEMUFile *f, const char *str)
 void qemu_file_set_blocking(QEMUFile *f, bool block)
 {
 if (f->ops->set_blocking) {
-f->ops->set_blocking(f->opaque, block, NULL);
+f->ops->set_blocking(f->ioc, block, NULL);
 }
 }
 
 /*
- * Return the ioc object if it's a migration channel.  Note: it can return NULL
- * for callers passing in a non-migration qemufile.  E.g. see qemu_fopen_bdrv()
- * and its usage in e.g. load_snapshot().  So we need to check against NULL
- * before using it.  If without the check, migration_incoming_state_destroy()
- * could fail for load_snapshot().
+ * qemu_file_get_ioc:
+ *
+ * Get the ioc object for the file, without incrementing
+ * the reference count.
+ *
+ * Returns: the ioc object
  */
 QIOChannel *qemu_file_get_ioc(QEMUFile *file)
 {
-return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL;
+return file->ioc;
 }
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 277f1d5a62..3a1ecc0e34 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -118,7 +118,7 @@ typedef struct QEMUFileHooks {
 QEMURamSaveFunc *save_page;
 } QEMUFileHooks;

[PULL 18/25] migration: remove unused QEMUFileGetFD typedef / qemu_get_fd method

2022-06-23 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 3c93a27978..fe1b2d1c00 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -46,10 +46,6 @@ typedef ssize_t (QEMUFileGetBufferFunc)(void *opaque, 
uint8_t *buf,
  */
 typedef int (QEMUFileCloseFunc)(void *opaque, Error **errp);
 
-/* Called to return the OS file descriptor associated to the QEMUFile.
- */
-typedef int (QEMUFileGetFD)(void *opaque);
-
 /* Called to change the blocking mode of the file
  */
 typedef int (QEMUFileSetBlocking)(void *opaque, bool enabled, Error **errp);
@@ -121,7 +117,6 @@ typedef struct QEMUFileHooks {
 QEMUFile *qemu_file_new_input(QIOChannel *ioc, const QEMUFileOps *ops);
 QEMUFile *qemu_file_new_output(QIOChannel *ioc, const QEMUFileOps *ops);
 void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
-int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 
 /*
-- 
2.36.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3588 matches

Mail list logo