Re: [PATCH] tests/qtest/migrate-test: Add a postcopy memfile test

2024-05-29 Thread Fabiano Rosas
Peter Xu  writes:

> On Wed, May 29, 2024 at 09:54:30AM -0300, Fabiano Rosas wrote:
>> Nicholas Piggin  writes:
>> 
>> > Postcopy requires userfaultfd support, which requires tmpfs if a memory
>> > file is used.
>> >
>> > This adds back support for /dev/shm memory files, but adds preallocation
>> > to skip environments where that mount is limited in size.
>> >
>> > Signed-off-by: Nicholas Piggin 
>> > ---
>> >
>> > How about this? This goes on top of the reset of the patches
>> > (I'll re-send them all as a series if we can get to some agreement).
>> >
>> > This adds back the /dev/shm option with preallocation and adds a test
>> > case that requires tmpfs.
>> 
>> Peter has stronger opinions on this than I do. I'll leave it to him to
>> decide.
>
> Sorry if I gave that feeling; it's more of a stronger willingness to at
> some point enable shmem for QEMU migration, rather than wanting to push
> back what Nicholas was trying to do.

Of course, I didn't mean to imply that. I just saying that using /tmp
would have been fine with me and I don't want to get in the way.

> Enabling more arch for migration
> tests is definitely worthwhile on its own.
>
> Shmem is just some blank spot that IMHO we should start to think about
> better coverarge. E.g. it is the only sane way to boot the VM that is able
> to do fast qemu upgrades using ignore-shared, that was true even before
> Steve's cpr-exec work, which would be much easier than anonymous. And it's
> also possible shmem can be (in the next 3-5 years) the 1G page provider to
> replace hugetlb for postcopy's sake - this one is far beyond our current
> discussion so I won't extend..

Interesting, good to know.

>
> IMHO shmem should just be a major backend just like anonymous, and the only
> possible file backend we can test in CI - as hugetlb is harder to manage
> there.
>
>> 
>> Just note that now we're making the CI less deterministic in relation to
>> the migration tests. When a test that uses shmem fails, we'll not be
>> able to consistently reproduce because the test might not even run
>> depending on what has consumed the shmem first.
>> 
>> Let's also take care that the other consumers of shmem (I think just
>> ivshmem-test) are able to cope with the migration-test taking all the
>> space, otherwise the CI will still break.
>
> Looks like ivshmem-test only uses 1MB shmem constantly so probably that
> will succeed if the migration test will, but true they face the same
> challenge and they interfere with each other..  that test sidently pass
> (instead of skip) if mktempshm() fails.  I guess we don't have a way to
> solidly test shmem as shmem simply may not be around.

Here we have each of the 5 migration archs taking up some amount of
memory + each of the 3 supported arches for ivshmem. They all could be
running in parallel through make check. In practice there's maybe less
overlap due to timing and not all CI jobs building all archs, but still.

>
> For this patch alone personally I'd avoid using "use_uffd_memfile" as the
> name, as that's definitely confusing, since shmem can be tested in other
> setups too without uffd.  Nicolas, please feel free to move ahead with your
> arch enablement series with /tmp if you want to separate the shmem issue.

Or just leave ignore_shared untouched for the ppc series.

>
> Thanks,



Re: [PULL 6/9] virtio-gpu: fix v2 migration

2024-05-29 Thread Fabiano Rosas
Fiona Ebner  writes:

> CC-ing stable, because this already is an issue in 9.0.0
>

Thank you for pointing this out. I was expecting b4 to find the tag, but
I just now noticed that the CC was added by Peter as a reply to the
message and not originally via the patch headers, so I should have added
it manually.




Re: [PATCH] tests/qtest/migrate-test: Add a postcopy memfile test

2024-05-29 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Postcopy requires userfaultfd support, which requires tmpfs if a memory
> file is used.
>
> This adds back support for /dev/shm memory files, but adds preallocation
> to skip environments where that mount is limited in size.
>
> Signed-off-by: Nicholas Piggin 
> ---
>
> How about this? This goes on top of the reset of the patches
> (I'll re-send them all as a series if we can get to some agreement).
>
> This adds back the /dev/shm option with preallocation and adds a test
> case that requires tmpfs.

Peter has stronger opinions on this than I do. I'll leave it to him to
decide.

Just note that now we're making the CI less deterministic in relation to
the migration tests. When a test that uses shmem fails, we'll not be
able to consistently reproduce because the test might not even run
depending on what has consumed the shmem first.

Let's also take care that the other consumers of shmem (I think just
ivshmem-test) are able to cope with the migration-test taking all the
space, otherwise the CI will still break.

>
> Thanks,
> Nick
>
>  tests/qtest/migration-test.c | 63 +++-
>  1 file changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 86eace354e..7fd9bbdc18 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  
>  #include "libqtest.h"
>  #include "qapi/qmp/qdict.h"
> @@ -553,6 +554,7 @@ typedef struct {
>   */
>  bool hide_stderr;
>  bool use_memfile;
> +bool use_uffd_memfile;
>  /* only launch the target process */
>  bool only_target;
>  /* Use dirty ring if true; dirty logging otherwise */
> @@ -739,7 +741,48 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  ignore_stderr = "";
>  }
>  
> -if (args->use_memfile) {
> +if (!qtest_has_machine(machine_alias)) {
> +g_autofree char *msg = g_strdup_printf("machine %s not supported",
> +   machine_alias);
> +g_test_skip(msg);
> +return -1;
> +}
> +
> +if (args->use_uffd_memfile) {
> +#if defined(__NR_userfaultfd) && defined(__linux__)
> +int fd;
> +uint64_t size;
> +
> +if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
> +g_test_skip("/dev/shm does not exist or is not a directory");
> +return -1;
> +}
> +
> +/*
> + * Pre-create and allocate the file here, because /dev/shm/
> + * is known to be limited in size in some places (e.g., Gitlab CI).
> + */
> +memfile_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
> +fd = open(memfile_path, O_WRONLY | O_CREAT | O_EXCL, S_IRUSR | 
> S_IWUSR);
> +if (fd == -1) {
> +g_test_skip("/dev/shm file could not be created");
> +return -1;
> +}
> +
> +g_assert(qemu_strtosz(memory_size, NULL, ) == 0);
> +size += 64*1024; /* QEMU may map a bit more memory for a guard page 
> */
> +
> +if (fallocate(fd, 0, 0, size) == -1) {
> +unlink(memfile_path);
> +perror("could not alloc"); exit(1);
> +g_test_skip("Could not allocate machine memory in /dev/shm");
> +return -1;
> +}
> +close(fd);
> +#else
> +g_test_skip("userfaultfd is not supported");
> +#endif
> +} else if (args->use_memfile) {
>  memfile_path = g_strdup_printf("/%s/qemu-%d", tmpfs, getpid());
>  memfile_opts = g_strdup_printf(
>  "-object memory-backend-file,id=mem0,size=%s"
> @@ -751,12 +794,6 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  kvm_opts = ",dirty-ring-size=4096";
>  }
>  
> -if (!qtest_has_machine(machine_alias)) {
> -g_autofree char *msg = g_strdup_printf("machine %s not supported", 
> machine_alias);
> -g_test_skip(msg);
> -return -1;
> -}
> -
>  machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
>QEMU_ENV_DST);
>  
> @@ -807,7 +844,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   * Remove shmem file immediately to avoid memory leak in test failed 
> case.
>   * It's valid because QEMU has already opened this file
>   */
> -if (args->use_memfile) {
> +if (args->use_memfile || args->use_uffd_memfile) {
>  unlink(memfile_path);
>  }
>  
> @@ -1275,6 +1312,15 @@ static void test_postcopy(void)
>  test_postcopy_common();
>  }
>  
> +static void test_postcopy_memfile(void)
> +{
> +MigrateCommon args = {
> +.start.use_uffd_memfile = true,
> +};
> +
> +test_postcopy_common();
> +}
> +
>  static void test_postcopy_suspend(void)
>  {
>  MigrateCommon args = {
> @@ -3441,6 +3487,7 @@ int 

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, May 28, 2024 at 09:35:22AM -0400, Peter Xu wrote:
>> On Tue, May 28, 2024 at 02:27:57PM +1000, Nicholas Piggin wrote:
>> > There is no need to use /dev/shm for file-backed memory devices, and
>> > it is too small to be usable in gitlab CI. Switch to using a regular
>> > file in /tmp/ which will usually have more space available.
>> > 
>> > Signed-off-by: Nicholas Piggin 
>> > ---
>> > Am I missing something? AFAIKS there is not even any point using
>> > /dev/shm aka tmpfs anyway, there is not much special about it as a
>> > filesystem. This applies on top of the series just sent, and passes
>> > gitlab CI qtests including aarch64.
>> 
>> I think it's just that /dev/shm guarantees shmem usage, while the var
>> "tmpfs" implies g_dir_make_tmp() which may be another non-ram based file
>> system, while that'll be slightly different comparing to what a real user
>> would use - we don't suggest user to put guest RAM on things like btrfs.
>> 
>> One real implication is if we add a postcopy test it'll fail with
>> g_dir_make_tmp() when it is not pointing to a shmem mount, as
>> UFFDIO_REGISTER will fail there.  But that test doesn't yet exist as the
>> QEMU paths should be the same even if Linux will trigger different paths
>> when different types of mem is used (anonymous v.s. shmem).
>> 
>> If the goal here is to properly handle the case where tmpfs doesn't have
>> enough space, how about what I suggested in the other email?
>> 
>> https://lore.kernel.org/r/ZlSppKDE6wzjCF--@x1n
>> 
>> IOW, try populate the shmem region before starting the guest, skip if
>> population failed.  Would that work?
>
> Let me append some more info here..
>
> I think madvise() isn't needed as fallocate() should do the population work
> already, afaiu, then it means we pass the shmem path to QEMU and QEMU
> should notice this memory-backend-file existed, open() directly.
>
> I quicked walk the QEMU memory code and so far it looks all applicable, so
> that QEMU should just start the guest with the pre-populated shmem page
> caches.
>
> There's one trick where qemu_ram_mmap() will map some extra pages, on x86
> 4k, and I don't yet know why we did that..
>
> /*
>  * Note: this always allocates at least one extra page of virtual address
>  * space, even if size is already aligned.
>  */
> total = size + align;

At the end of the function:

/*
 * Leave a single PROT_NONE page allocated after the RAM block, to serve as
 * a guard page guarding against potential buffer overflows.
 */



Re: [RFC PATCH 4/4] ci: Add the new migration device tests

2024-05-28 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, May 27, 2024 at 08:59:00PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
>> >> We have two new migration tests that check cross version
>> >> compatibility. One uses the vmstate-static-checker.py script to
>> >> compare the vmstate structures from two different QEMU versions. The
>> >> other runs a simple migration with a few devices present in the VM, to
>> >> catch obvious breakages.
>> >> 
>> >> Add both tests to the migration-compat-common job.
>> >> 
>> >> Signed-off-by: Fabiano Rosas 
>> >> ---
>> >>  .gitlab-ci.d/buildtest.yml | 43 +++---
>> >>  1 file changed, 36 insertions(+), 7 deletions(-)
>> >> 
>> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> >> index 91c57efded..bc7ac35983 100644
>> >> --- a/.gitlab-ci.d/buildtest.yml
>> >> +++ b/.gitlab-ci.d/buildtest.yml
>> >> @@ -202,18 +202,47 @@ build-previous-qemu:
>> >>needs:
>> >>  - job: build-previous-qemu
>> >>  - job: build-system-opensuse
>> >> -  # The old QEMU could have bugs unrelated to migration that are
>> >> -  # already fixed in the current development branch, so this test
>> >> -  # might fail.
>> >> +  # This test is allowed to fail because:
>> >> +  #
>> >> +  # - The old QEMU could have bugs unrelated to migration that are
>> >> +  #   already fixed in the current development branch.
>> >
>> > Did you ever hit a real failure with this?  I'm wondering whether we can
>> > remove this allow_failure thing.
>> >
>> 
>> I haven't. But when it fails we'll go through an entire release cycle
>> with this thing showing red for every person that runs the CI. Remember,
>> this is a CI failure to which there's no fix aside from waiting for the
>> release to happen. Even if we're quick to react and disable the job, I
>> feel it might create some confusion already.
>
> My imagination was if needed we'll get complains and we add that until
> then for that broken release only, and remove in the next release again.
>
>> 
>> >> +  #
>> >> +  # - The vmstate-static-checker script trips on renames and other
>> >> +  #   backward-compatible changes to the vmstate structs.
>> >
>> > I think I keep my preference per last time we talked on this. :)
>> 
>> Sorry, I'm not trying to force this in any way, I just wrote these to
>> use in the pull-request and thought I'd put it out there. At the very
>> least we can have your concerns documented. =)
>
> Yep that's fine.  I think we should keep such discussion on the list,
> especially we have different opinions, while none of us got convinced yet
> so far. :)
>
>> 
>> > I still think it's too early to involve a test that can report false
>> > negative.
>> 
>> (1)
>> Well, we haven't seen any false negatives, we've seen fields being
>> renamed. If that happens, then we'll ask the person to update the
>> script. Is that not acceptable to you? Or are you thinking about other
>> sorts of issues?
>
> Then question is how to update the script. So far it's treated as failure
> on rename, even if it's benign. Right now we have this:
>
> print("Section \"" + sec + "\",", end=' ')
> print("Description \"" + desc + "\":", end=' ')
> print("expected field \"" + s_item["field"] + "\",", end=' ')
> print("got \"" + d_item["field"] + "\"; skipping rest")
> bump_taint()
> break
>
> Do you want to introduce a list of renamed vmsd fields in this script and
> maintain that?  IMHO it's an overkill and unnecessary burden to other
> developers.
>

That's not _my_ idea, we already have that (see below). There's not much
reason to rename fields like that, the vmstate is obviously something
that should be kept stable, so having to do a rename in a script is way
better than having to figure out the fix for the compatibility break.

def check_fields_match(name, s_field, d_field):
if s_field == d_field:
return True

# Some fields changed names between qemu versions.  This list
# is used to allow such changes in each section / description.
changed_names = {
'apic': ['timer', 'timer_expiry'],
 

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Fabiano Rosas
Nicholas Piggin  writes:

> There is no need to use /dev/shm for file-backed memory devices, and
> it is too small to be usable in gitlab CI. Switch to using a regular
> file in /tmp/ which will usually have more space available.
>
> Signed-off-by: Nicholas Piggin 
> ---
> Am I missing something? AFAIKS there is not even any point using
> /dev/shm aka tmpfs anyway, there is not much special about it as a
> filesystem. This applies on top of the series just sent, and passes
> gitlab CI qtests including aarch64.

/dev/shm however will be mounted on tmpfs while /tmp may not. I don't
know if this has any implication to this test. Probably not.

>
> Thanks,
> Nick
>
>  tests/qtest/migration-test.c | 41 
>  1 file changed, 13 insertions(+), 28 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 45830eb213..86eace354e 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -552,7 +552,7 @@ typedef struct {
>   * unconditionally, because it means the user would like to be verbose.
>   */
>  bool hide_stderr;
> -bool use_shmem;
> +bool use_memfile;
>  /* only launch the target process */
>  bool only_target;
>  /* Use dirty ring if true; dirty logging otherwise */
> @@ -672,29 +672,14 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  g_autofree gchar *cmd_source = NULL;
>  g_autofree gchar *cmd_target = NULL;
>  const gchar *ignore_stderr;
> -g_autofree char *shmem_opts = NULL;
> -g_autofree char *shmem_path = NULL;
> +g_autofree char *memfile_opts = NULL;
> +g_autofree char *memfile_path = NULL;
>  const char *kvm_opts = NULL;
>  const char *arch = qtest_get_arch();
>  const char *memory_size;
>  const char *machine_alias, *machine_opts = "";
>  g_autofree char *machine = NULL;
>  
> -if (args->use_shmem) {
> -if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
> -g_test_skip("/dev/shm is not supported");
> -return -1;
> -}
> -if (getenv("GITLAB_CI")) {
> -/*
> - * Gitlab runners are limited to 64MB shm size. See:
> - * https://lore.kernel.org/all/87ttq5fvh7@suse.de/
> - */
> -g_test_skip("/dev/shm is not supported in Gitlab CI 
> environment");
> -return -1;
> -}
> -}
> -
>  dst_state = (QTestMigrationState) { };
>  src_state = (QTestMigrationState) { };
>  bootfile_create(tmpfs, args->suspend_me);
> @@ -754,12 +739,12 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  ignore_stderr = "";
>  }
>  
> -if (args->use_shmem) {
> -shmem_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
> -shmem_opts = g_strdup_printf(
> +if (args->use_memfile) {
> +memfile_path = g_strdup_printf("/%s/qemu-%d", tmpfs, getpid());

The variable tmpfs already contains the leading slash. Strictly speaking
we don't need the pid because 'tmpfs' is unique for each migration-test
run. If you use a fixed string such as qemu-mem, you can then clean it
up at test_migrate_end() along with the others.

> +memfile_opts = g_strdup_printf(
>  "-object memory-backend-file,id=mem0,size=%s"
>  ",mem-path=%s,share=on -numa node,memdev=mem0",
> -memory_size, shmem_path);
> +memory_size, memfile_path);
>  }
>  
>  if (args->use_dirty_ring) {
> @@ -788,7 +773,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   memory_size, tmpfs,
>   arch_opts ? arch_opts : "",
>   arch_source ? arch_source : "",
> - shmem_opts ? shmem_opts : "",
> + memfile_opts ? memfile_opts : "",
>   args->opts_source ? args->opts_source : "",
>   ignore_stderr);
>  if (!args->only_target) {
> @@ -810,7 +795,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   memory_size, tmpfs, uri,
>   arch_opts ? arch_opts : "",
>   arch_target ? arch_target : "",
> - shmem_opts ? shmem_opts : "",
> + memfile_opts ? memfile_opts : "",
>   args->opts_target ? args->opts_target : "",
>   ignore_stderr);
>  *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target);
> @@ -822,8 +807,8 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   * Remove shmem file immediately to avoid memory leak in test failed 
> case.
>   * It's valid because QEMU has already opened this file
>   */

I'm 

Re: [RFC PATCH 4/4] ci: Add the new migration device tests

2024-05-27 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
>> We have two new migration tests that check cross version
>> compatibility. One uses the vmstate-static-checker.py script to
>> compare the vmstate structures from two different QEMU versions. The
>> other runs a simple migration with a few devices present in the VM, to
>> catch obvious breakages.
>> 
>> Add both tests to the migration-compat-common job.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  .gitlab-ci.d/buildtest.yml | 43 +++---
>>  1 file changed, 36 insertions(+), 7 deletions(-)
>> 
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 91c57efded..bc7ac35983 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -202,18 +202,47 @@ build-previous-qemu:
>>needs:
>>  - job: build-previous-qemu
>>  - job: build-system-opensuse
>> -  # The old QEMU could have bugs unrelated to migration that are
>> -  # already fixed in the current development branch, so this test
>> -  # might fail.
>> +  # This test is allowed to fail because:
>> +  #
>> +  # - The old QEMU could have bugs unrelated to migration that are
>> +  #   already fixed in the current development branch.
>
> Did you ever hit a real failure with this?  I'm wondering whether we can
> remove this allow_failure thing.
>

I haven't. But when it fails we'll go through an entire release cycle
with this thing showing red for every person that runs the CI. Remember,
this is a CI failure to which there's no fix aside from waiting for the
release to happen. Even if we're quick to react and disable the job, I
feel it might create some confusion already.

>> +  #
>> +  # - The vmstate-static-checker script trips on renames and other
>> +  #   backward-compatible changes to the vmstate structs.
>
> I think I keep my preference per last time we talked on this. :)

Sorry, I'm not trying to force this in any way, I just wrote these to
use in the pull-request and thought I'd put it out there. At the very
least we can have your concerns documented. =)

> I still think it's too early to involve a test that can report false
> negative.

(1)
Well, we haven't seen any false negatives, we've seen fields being
renamed. If that happens, then we'll ask the person to update the
script. Is that not acceptable to you? Or are you thinking about other
sorts of issues?

> I'd still keep running this before soft-freeze like I used to
> do, throw issues to others and urge them to fix before release.

Having hidden procedures that maintainers run before a release is bad
IMHO, it just delays the catching of bugs and frustrates
contributors. Imagine working on a series, everything goes well with
reviews, CI passes, patch gets queued and merged and a month later you
get a ping about something you should have done to avoid breaking
migration. Right during freeze.

> Per my
> previous experience that doesn't consume me a lot of time, and it's not
> common to see issues either.
>
> So I want people to really pay attention when someone sees a migration CI
> test failed, rather than we help people form the habit in "oh migration CI
> failed again?  I think that's fine, it allows failing anyway".

That's a good point. I don't think it applies here though. See my point
in (1).

> So far I still don't see as much benefit to adding this if we need to pay
> for the other false negative issue.  I'll fully support it if e.g. we can
> fix the tool to avoid reporting false negatives, but that may take effort
> that I didn't check.
>



Re: [RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests

2024-05-27 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, May 23, 2024 at 05:19:21PM -0300, Fabiano Rosas wrote:
>> The current migration-tests are almost entirely focused on catching
>> bugs on the migration code itself, not on the device migration
>> infrastructure (vmstate). That means we miss catching some low hanging
>> fruits that would show up immediately if only we had the device in
>> question present in the VM.
>> 
>> Add a list of devices to include by default in the migration-tests,
>> starting with one that recently had issues, virtio-gpu. Also add an
>> environment variable QTEST_DEVICE_OPTS to allow test users to
>> experiment with different devices or device options.
>> 
>> Do not run every migration test with the devices because that would
>> increase the complexity of the command lines and, as mentioned, the
>> migration-tests are mostly used to test the core migration code, not
>> the device migration. Add a special value QTEST_DEVICE_OPTS=all that
>> enables testing with devices.
>> 
>> Notes on usage:
>> 
>> For this new testing mode, it's not useful to run all the migration
>> tests, a single test would probably suffice to catch any issues, so
>> provide the -p option to migration-test and the test of your choice.
>> 
>> Like with the cross-version compatibility tests in CI and the recently
>> introduced vmstate-static-checker test, to be of any use, a test with
>> devices needs to be run against a different QEMU version, like so:
>> 
>> $ cd build
>> $ QTEST_DEVICE_OPTS=all \
>>  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>>  QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
>>  ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
>> 
>> $ cd build
>> $ QTEST_DEVICE_OPTS='-device virtio-net' \
>>  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>>  QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
>>  ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  tests/qtest/migration-test.c | 19 +--
>>  1 file changed, 17 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 2253e0fc5b..35bb224d18 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -71,6 +71,13 @@ static QTestMigrationState dst_state;
>>  #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
>>  #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
>>  
>> +/*
>> + * The tests using DEFAULT_DEVICES need a special invocation and
>> + * cannot be reached from make check, so don't bother with the
>> + * --without-default-devices build.
>
> What's this "--without-default-devices"?

A configure option. It removes from the build any devices that are
marked as default. It's an endless source of bugs because it is supposed
to be paired with a config file that adds back some of the removed
devices, but there's nothing enforcing that so we always run it as is
and generate a broken QEMU binary.

So anything in the tests that refer to devices should first check if
that QEMU binary even has the device present. I'm saying here that we're
not going to do that because this test cannot be accidentally reached
via make check. Realistically, most people will consume this test
through the CI job only.



Re: [RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker

2024-05-27 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, May 23, 2024 at 05:19:20PM -0300, Fabiano Rosas wrote:
>> We have the vmstate-static-checker script that takes the output of:
>> '$QEMU -M $machine -dump-vmstate' for two different QEMU versions and
>> compares them to check for compatibility breakages. This is just too
>> simple and useful for us to pass on it. Add a test that runs the
>> script.
>> 
>> Since this needs to use two different QEMU versions, the test is
>> skipped if only one QEMU is provided. The infrastructure for passing
>> more than one binary is already in place:
>> 
>> $ PYTHON=$(which python3.11) \
>>  QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-x86_64 \
>>  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>>  ./tests/qtest/migration-test -p /x86_64/migration/vmstate-checker-script
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>> some code duplication for now, just so we can reason about this
>> without too much noise
>> ---
>>  tests/qtest/migration-test.c | 82 
>>  1 file changed, 82 insertions(+)
>> 
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index e8d3555f56..2253e0fc5b 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -63,6 +63,7 @@ static QTestMigrationState dst_state;
>>  #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
>>  
>>  #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
>> +#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
>>  
>>  #define QEMU_VM_FILE_MAGIC 0x5145564d
>>  #define FILE_TEST_FILENAME "migfile"
>> @@ -1611,6 +1612,85 @@ static void test_analyze_script(void)
>>  test_migrate_end(from, to, false);
>>  cleanup("migfile");
>>  }
>> +
>> +static void test_vmstate_checker_script(void)
>> +{
>> +g_autofree gchar *cmd_src = NULL;
>> +g_autofree gchar *cmd_dst = NULL;
>> +g_autofree gchar *vmstate_src = NULL;
>> +g_autofree gchar *vmstate_dst = NULL;
>> +const char *machine_alias, *machine_opts = "";
>> +g_autofree char *machine = NULL;
>> +const char *arch = qtest_get_arch();
>> +int pid, wstatus;
>> +const char *python = g_getenv("PYTHON");
>> +
>> +if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
>> +g_test_skip("Test needs two different QEMU versions");
>> +return;
>> +}
>> +
>> +if (!python) {
>> +g_test_skip("PYTHON variable not set");
>> +return;
>> +}
>> +
>> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>> +if (g_str_equal(arch, "i386")) {
>> +machine_alias = "pc";
>> +} else {
>> +machine_alias = "q35";
>> +}
>> +} else if (g_str_equal(arch, "s390x")) {
>> +machine_alias = "s390-ccw-virtio";
>> +} else if (strcmp(arch, "ppc64") == 0) {
>> +machine_alias = "pseries";
>> +} else if (strcmp(arch, "aarch64") == 0) {
>> +machine_alias = "virt";
>> +} else {
>> +g_assert_not_reached();
>> +}
>> +
>> +if (!qtest_has_machine(machine_alias)) {
>> +g_autofree char *msg = g_strdup_printf("machine %s not supported", 
>> machine_alias);
>> +g_test_skip(msg);
>> +return;
>> +}
>> +
>> +machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
>> +  QEMU_ENV_DST);
>> +
>> +vmstate_src = g_strdup_printf("%s/vmstate-src", tmpfs);
>> +vmstate_dst = g_strdup_printf("%s/vmstate-dst", tmpfs);
>> +
>> +cmd_dst = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
>> +  machine, machine_opts, vmstate_dst);
>> +cmd_src = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
>> +  machine, machine_opts, vmstate_src);
>> +
>> +qtest_init_with_env_no_handshake(QEMU_ENV_SRC, cmd_src);
>> +qtest_init_with_env_no_handshake(QEMU_ENV_DST, cmd_dst);
>> +
>> +pid = fork();
>> +if (!pid) {
>> +close(1);
>> +open("/dev/null", O_WRONLY);
>> +execl(python, python, VMSTATE_CHECKER_SCRIPT,
>> +  &qu

Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared

2024-05-27 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote:
>> However, there is an issue here still on all archs - which might very
>> well have been the original issue - which is the fact that the
>> containers on the Gitlab CI have limits on shared memory usage.
>> Unfortunately we cannot enable this test for the CI, so it needs a check
>> on the GITLAB_CI environment variable.
>
> Another option is we teach migration-test to detect whether memory_size of
> shmem is available, skip if not.  It can be a sequence of:
>
>   memfd_create()
>   fallocate()
>   ret = madvise(MADV_POPULATE_WRITE)
>
> To be run at the entry of migration-test, and skip all use_shmem=true tests
> if ret != 0, or any step failed above.

There are actually two issues:

1) Trying to run a test that needs more shmem than available in the
container. This is covered well by your suggestion.

2) Trying to use some shmem while another test has already consumed all
shmem. I'm not sure if this can be done reliably as the tests run in
parallel.



Re: [PATCH 3/3] tests/qtest/migration-test: Use custom asm bios for ppc64

2024-05-27 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Similar to other archs, build a custom bios memory updater. Running the
> test with OF code is a cool trick, but SLOF takes a long time to boot.
> This reduces test time by around 3x (150s to 50s).
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 



Re: [RFC PATCH 3/3] tests/qtest/migration-test: Enable test_ignore_shared

2024-05-27 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This was said to be broken on aarch64, but if it works on others,
> let's try enable it. It's already starting to bitrot...

Yeah, look at the state of this...

I don't know what the issue was on aarch64, but I'm all for enabling
this test globally and then we deal with the breakage if it ever
comes. I don't think it will.

However, there is an issue here still on all archs - which might very
well have been the original issue - which is the fact that the
containers on the Gitlab CI have limits on shared memory usage.
Unfortunately we cannot enable this test for the CI, so it needs a check
on the GITLAB_CI environment variable.

There's also the cpr-reboot test which got put under "flaky", that has
the same issue. That one should also have been under GITLAB_CI. From
that discussion:

  "We have an issue with this test on CI:
  
  $ df -h /dev/shm
  Filesystem  Size  Used Avail Use% Mounted on
  shm  64M 0   64M   0% /dev/shm
  
  These are shared CI runners, so AFAICT there's no way to increase the
  shared memory size.
  
  Reducing the memory for this single test also wouldn't work because we
  can run migration-test for different archs in parallel + there's the
  ivshmem_test which uses 4M.
  
  Maybe just leave it out of CI? Laptops will probably have enough shared
  memory to not hit this. If we add a warning comment to the test, might
  be enough." -- https://lore.kernel.org/all/87ttq5fvh7@suse.de

>
> Cc: Yury Kotov 
> Cc: Dr. David Alan Gilbert 
> Signed-off-by: Nicholas Piggin 
> ---
>  tests/qtest/migration-test.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7987faaded..2bcdc33b7c 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1862,14 +1862,15 @@ static void 
> test_precopy_unix_tls_x509_override_host(void)
>  #endif /* CONFIG_TASN1 */
>  #endif /* CONFIG_GNUTLS */
>  
> -#if 0
> -/* Currently upset on aarch64 TCG */
>  static void test_ignore_shared(void)
>  {
>  g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>  QTestState *from, *to;
> +MigrateStart args = {
> +.use_shmem = true,
> +};
>  
> -if (test_migrate_start(, , uri, false, true, NULL, NULL)) {
> +if (test_migrate_start(, , uri, )) {
>  return;
>  }
>  
> @@ -1898,7 +1899,6 @@ static void test_ignore_shared(void)
>  
>  test_migrate_end(from, to, true);
>  }
> -#endif
>  
>  static void *
>  test_migrate_xbzrle_start(QTestState *from,
> @@ -3537,7 +3537,10 @@ int main(int argc, char **argv)
>  #endif /* CONFIG_TASN1 */
>  #endif /* CONFIG_GNUTLS */
>  
> -/* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
> +if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */
> +migration_test_add("/migration/ignore_shared", test_ignore_shared);
> +}
> +
>  #ifndef _WIN32
>  migration_test_add("/migration/precopy/fd/tcp",
> test_migrate_precopy_fd_socket);



Re: [PATCH V1 20/26] migration: cpr-exec mode

2024-05-24 Thread Fabiano Rosas
Steve Sistare  writes:

> Add the cpr-exec migration mode.  Usage:
>   qemu-system-$arch -machine memfd-alloc=on ...
>   migrate_set_parameter mode cpr-exec
>   migrate_set_parameter cpr-exec-args \
>   ... -incoming 
>   migrate -d 
>
> The migrate command stops the VM, saves state to the URI,
> directly exec's a new version of QEMU on the same host,
> replacing the original process while retaining its PID, and
> loads state from the URI.  Guest RAM is preserved in place,
> albeit with new virtual addresses.
>
> Arguments for the new QEMU process are taken from the
> @cpr-exec-args parameter.  The first argument should be the
> path of a new QEMU binary, or a prefix command that exec's the
> new QEMU binary.
>
> Because old QEMU terminates when new QEMU starts, one cannot
> stream data between the two, so the URI must be a type, such as
> a file, that reads all data before old QEMU exits.
>
> Memory backend objects must have the share=on attribute, and
> must be mmap'able in the new QEMU process.  For example,
> memory-backend-file is acceptable, but memory-backend-ram is
> not.
>
> The VM must be started with the '-machine memfd-alloc=on'
> option.  This causes implicit ram blocks (those not explicitly
> described by a memory-backend object) to be allocated by
> mmap'ing a memfd.  Examples include VGA, ROM, and even guest
> RAM when it is specified without a memory-backend object.
>
> The implementation saves precreate vmstate at the end of normal
> migration in migrate_fd_cleanup, and tells the main loop to call
> cpr_exec.  Incoming qemu loads preceate state early, before objects
> are created.  The memfds are kept open across exec by clearing the
> close-on-exec flag, their values are saved in precreate vmstate,
> and they are mmap'd in new qemu.
>
> Note that the memfd-alloc option is not related to memory-backend-memfd.
> Later patches add support for memory-backend-memfd, and for additional
> devices, including vfio, chardev, and more.
>
> Signed-off-by: Steve Sistare 
> ---
>  include/migration/cpr.h  |  14 +
>  include/migration/misc.h |   3 ++
>  migration/cpr.c  | 131 
> +++
>  migration/meson.build|   1 +
>  migration/migration.c|  21 
>  migration/migration.h|   5 +-
>  migration/ram.c  |   1 +
>  qapi/migration.json  |  30 ++-
>  system/physmem.c |   2 +
>  system/vl.c  |   4 ++
>  10 files changed, 210 insertions(+), 2 deletions(-)
>  create mode 100644 include/migration/cpr.h
>  create mode 100644 migration/cpr.c
>
> diff --git a/include/migration/cpr.h b/include/migration/cpr.h
> new file mode 100644
> index 000..aa8316d
> --- /dev/null
> +++ b/include/migration/cpr.h
> @@ -0,0 +1,14 @@
> +/*
> + * Copyright (c) 2021, 2024 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef MIGRATION_CPR_H
> +#define MIGRATION_CPR_H
> +
> +bool cpr_needed_for_exec(void *opaque);
> +void cpr_unpreserve_fds(void);
> +
> +#endif
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index cf30351..5b963ba 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -122,4 +122,7 @@ bool migration_in_bg_snapshot(void);
>  /* migration/block-dirty-bitmap.c */
>  void dirty_bitmap_mig_init(void);
>  
> +/* migration/cpr.c */
> +void cpr_exec(char **argv);
> +
>  #endif
> diff --git a/migration/cpr.c b/migration/cpr.c
> new file mode 100644
> index 000..d4703e1
> --- /dev/null
> +++ b/migration/cpr.c
> @@ -0,0 +1,131 @@
> +/*
> + * Copyright (c) 2021-2024 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "exec/ramblock.h"
> +#include "migration/cpr.h"
> +#include "migration/migration.h"
> +#include "migration/misc.h"
> +#include "migration/vmstate.h"
> +#include "sysemu/runstate.h"
> +#include "trace.h"
> +
> +/*/
> +#define CPR_STATE "CprState"
> +
> +typedef struct CprState {
> +MigMode mode;
> +} CprState;
> +
> +static CprState cpr_state = {
> +.mode = MIG_MODE_NORMAL,
> +};
> +
> +static int cpr_state_presave(void *opaque)
> +{
> +cpr_state.mode = migrate_mode();
> +return 0;
> +}
> +
> +bool cpr_needed_for_exec(void *opaque)
> +{
> +return migrate_mode() == MIG_MODE_CPR_EXEC;
> +}
> +
> +static const VMStateDescription vmstate_cpr_state = {
> +.name = CPR_STATE,
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = cpr_needed_for_exec,
> +.pre_save = cpr_state_presave,
> +.precreate = true,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(mode, CprState),
> +

Re: [PATCH V1 05/26] migration: precreate vmstate

2024-05-24 Thread Fabiano Rosas
Steve Sistare  writes:

> Provide the VMStateDescription precreate field to mark objects that must
> be loaded on the incoming side before devices have been created, because
> they provide properties that will be needed at creation time.  They will
> be saved to and loaded from their own QEMUFile, via
> qemu_savevm_precreate_save and qemu_savevm_precreate_load, but these
> functions are not yet called in this patch.  Allow them to be called
> before or after normal migration is active, when current_migration and
> current_incoming are not valid.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-24 Thread Fabiano Rosas
Steve Sistare  writes:

> This patch series adds the live migration cpr-exec mode.  In this mode, QEMU
> stops the VM, writes VM state to the migration URI, and directly exec's a
> new version of QEMU on the same host, replacing the original process while
> retaining its PID.  Guest RAM is preserved in place, albeit with new virtual
> addresses.  The user completes the migration by specifying the -incoming
> option, and by issuing the migrate-incoming command if necessary.  This
> saves and restores VM state, with minimal guest pause time, so that QEMU may
> be updated to a new version in between.
>
> The new interfaces are:
>   * cpr-exec (MigMode migration parameter)
>   * cpr-exec-args (migration parameter)
>   * memfd-alloc=on (command-line option for -machine)
>   * only-migratable-modes (command-line argument)
>
> The caller sets the mode parameter before invoking the migrate command.
>
> Arguments for the new QEMU process are taken from the cpr-exec-args parameter.
> The first argument should be the path of a new QEMU binary, or a prefix
> command that exec's the new QEMU binary, and the arguments should include
> the -incoming option.
>
> Memory backend objects must have the share=on attribute, and must be mmap'able
> in the new QEMU process.  For example, memory-backend-file is acceptable,
> but memory-backend-ram is not.
>
> QEMU must be started with the '-machine memfd-alloc=on' option.  This causes
> implicit RAM blocks (those not explicitly described by a memory-backend
> object) to be allocated by mmap'ing a memfd.  Examples include VGA, ROM,
> and even guest RAM when it is specified without without reference to a
> memory-backend object.   The memfds are kept open across exec, their values
> are saved in vmstate which is retrieved after exec, and they are re-mmap'd.
>
> The '-only-migratable-modes cpr-exec' option guarantees that the
> configuration supports cpr-exec.  QEMU will exit at start time if not.
>
> Example:
>
> In this example, we simply restart the same version of QEMU, but in
> a real scenario one would set a new QEMU binary path in cpr-exec-args.
>
>   # qemu-kvm -monitor stdio -object
>   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
>   -m 4G -machine memfd-alloc=on ...
>
>   QEMU 9.1.50 monitor - type 'help' for more information
>   (qemu) info status
>   VM status: running
>   (qemu) migrate_set_parameter mode cpr-exec
>   (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming 
> file:vm.state
>   (qemu) migrate -d file:vm.state
>   (qemu) QEMU 9.1.50 monitor - type 'help' for more information
>   (qemu) info status
>   VM status: running
>
> cpr-exec mode preserves attributes of outgoing devices that must be known
> before the device is created on the incoming side, such as the memfd 
> descriptor
> number, but currently the migration stream is read after all devices are
> created.  To solve this problem, I add two VMStateDescription options:
> precreate and factory.  precreate objects are saved to their own migration
> stream, distinct from the main stream, and are read early by incoming QEMU,
> before devices are created.  Factory objects are allocated on demand, without
> relying on a pre-registered object's opaque address, which is necessary
> because the devices to which the state will apply have not been created yet
> and hence have not registered an opaque address to receive the state.
>
> This patch series implements a minimal version of cpr-exec.  Future series
> will add support for:
>   * vfio
>   * chardev's without loss of connectivity
>   * vhost
>   * fine-grained seccomp controls
>   * hostmem-memfd
>   * cpr-exec migration test
>
>
> Steve Sistare (26):
>   oslib: qemu_clear_cloexec
>   vl: helper to request re-exec
>   migration: SAVEVM_FOREACH
>   migration: delete unused parameter mis
>   migration: precreate vmstate
>   migration: precreate vmstate for exec
>   migration: VMStateId
>   migration: vmstate_info_void_ptr
>   migration: vmstate_register_named
>   migration: vmstate_unregister_named
>   migration: vmstate_register at init time
>   migration: vmstate factory object
>   physmem: ram_block_create
>   physmem: hoist guest_memfd creation
>   physmem: hoist host memory allocation
>   physmem: set ram block idstr earlier
>   machine: memfd-alloc option
>   migration: cpr-exec-args parameter
>   physmem: preserve ram blocks for cpr
>   migration: cpr-exec mode
>   migration: migrate_add_blocker_mode
>   migration: ram block cpr-exec blockers
>   migration: misc cpr-exec blockers
>   seccomp: cpr-exec blocker
>   migration: fix mismatched GPAs during cpr-exec
>   migration: only-migratable-modes
>
>  accel/xen/xen-all.c|   5 +
>  backends/hostmem-epc.c |  12 +-
>  hmp-commands.hx|   2 +-
>  hw/core/machine.c  |  22 +++
>  hw/core/qdev.c |   1 +
>  hw/intc/apic_common.c  |   2 +-
>  hw/vfio/migration.c|   3 +-
>  

Re: [PATCH V1 23/26] migration: misc cpr-exec blockers

2024-05-24 Thread Fabiano Rosas
Steve Sistare  writes:

> Add blockers for cpr-exec migration mode for devices and options that do
> not support it.
>
> Signed-off-by: Steve Sistare 
> ---
>  accel/xen/xen-all.c|  5 +
>  backends/hostmem-epc.c | 12 ++--
>  hw/vfio/migration.c|  3 ++-
>  replay/replay.c|  6 ++
>  4 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
> index 0bdefce..9a7ed0f 100644
> --- a/accel/xen/xen-all.c
> +++ b/accel/xen/xen-all.c

This file is missing the migration/blocker.h include.




Re: [PATCH v2 01/18] migration: Fix file migration with fdset

2024-05-24 Thread Fabiano Rosas
Prasad Pandit  writes:

> On Fri, 24 May 2024 at 00:38, Fabiano Rosas  wrote:
>> This is further indicated by the presence of the 'offset'
>> argument, which indicates the start of the region where QEMU is
>> allowed to write.
>>
>> Fix the issue by replacing the O_TRUNC flag on open by an ftruncate
>> call, which will take the offset into consideration.
>>
>> +if (ftruncate(fioc->fd, offset)) {
>> +error_setg_errno(errp, errno,
>> + "failed to truncate migration file to offset %" 
>> PRIx64,
>> + offset);
>> +object_unref(OBJECT(fioc));
>> +return;
>> +}
>> +
>
> * Should 'offset' be checked for > zero while ftruncating? Else it'll
> be same as O_TRUNC. Otherwise it looks fine.

That's the point. If offset==0 we truncate all the way, if not, we
truncate to the offset.

>
> Reviewed-by: Prasad Pandit 

Thanks!



Re: [PATCH] tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too

2024-05-24 Thread Fabiano Rosas
Thomas Huth  writes:

> On 24/05/2024 02.05, Nicholas Piggin wrote:
>> On Wed May 22, 2024 at 7:12 PM AEST, Thomas Huth wrote:
>>> On s390x, we recently had a regression that broke migration / savevm
>>> (see commit bebe9603fc ("hw/intc/s390_flic: Fix crash that occurs when
>>> saving the machine state"). The problem was merged without being noticed
>>> since we currently do not run any migration / savevm related tests on
>>> x86 hosts.
>>> While we currently cannot run all migration tests for the s390x target
>>> on x86 hosts yet (due to some unresolved issues with TCG), we can at
>>> least run some of the non-live tests to avoid such problems in the future.
>>> Thus enable the "analyze-script" and the "bad_dest" tests before checking
>>> for KVM on s390x or ppc64 (this also fixes the problem that the
>>> "analyze-script" test was not run on s390x at all anymore since it got
>>> disabled again by accident in a previous refactoring of the code).
>> 
>> ppc64 is working for me, can it be enabled fully, or is it still
>> breaking somewhere? FWIW I have a patch to change it from using
>> open-firmware commands to a boot file which speeds it up.
>
> IIRC last time that I tried it was working fine for me, too, but getting a 
> speedup here first would be very welcome since using the Forth code slows 
> down the whole testing quite a bit.

Yeah, we're all gonna get kicked from the project if we add 10m to make
check in CI. =)

@Nick, send us that patch and I'd be glad to reenable the tests.



[RFC PATCH 3/4] tests/qtest/migration: Add support for simple device tests

2024-05-23 Thread Fabiano Rosas
The current migration-tests are almost entirely focused on catching
bugs on the migration code itself, not on the device migration
infrastructure (vmstate). That means we miss catching some low hanging
fruits that would show up immediately if only we had the device in
question present in the VM.

Add a list of devices to include by default in the migration-tests,
starting with one that recently had issues, virtio-gpu. Also add an
environment variable QTEST_DEVICE_OPTS to allow test users to
experiment with different devices or device options.

Do not run every migration test with the devices because that would
increase the complexity of the command lines and, as mentioned, the
migration-tests are mostly used to test the core migration code, not
the device migration. Add a special value QTEST_DEVICE_OPTS=all that
enables testing with devices.

Notes on usage:

For this new testing mode, it's not useful to run all the migration
tests, a single test would probably suffice to catch any issues, so
provide the -p option to migration-test and the test of your choice.

Like with the cross-version compatibility tests in CI and the recently
introduced vmstate-static-checker test, to be of any use, a test with
devices needs to be run against a different QEMU version, like so:

$ cd build
$ QTEST_DEVICE_OPTS=all \
 QTEST_QEMU_BINARY=./qemu-system-x86_64 \
 QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
 ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain

$ cd build
$ QTEST_DEVICE_OPTS='-device virtio-net' \
 QTEST_QEMU_BINARY=./qemu-system-x86_64 \
 QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-x86_64 \
 ./tests/qtest/migration-test -p /x86_64/migration/precopy/tcp/plain

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 2253e0fc5b..35bb224d18 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -71,6 +71,13 @@ static QTestMigrationState dst_state;
 #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
 #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
 
+/*
+ * The tests using DEFAULT_DEVICES need a special invocation and
+ * cannot be reached from make check, so don't bother with the
+ * --without-default-devices build.
+ */
+#define DEFAULT_DEVICES "-device virtio-gpu"
+
 #if defined(__linux__)
 #include 
 #include 
@@ -701,6 +708,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 const char *memory_size;
 const char *machine_alias, *machine_opts = "";
 g_autofree char *machine = NULL;
+g_autofree gchar *device_opts = NULL;
 
 if (args->use_shmem) {
 if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
@@ -793,12 +801,17 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 
 g_test_message("Using machine type: %s", machine);
 
+device_opts = g_strdup(getenv("QTEST_DEVICE_OPTS"));
+if (g_str_equal(device_opts, "all")) {
+device_opts = g_strdup(DEFAULT_DEVICES);
+}
+
 cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
  "-machine %s,%s "
  "-name source,debug-threads=on "
  "-m %s "
  "-serial file:%s/src_serial "
- "%s %s %s %s %s",
+ "%s %s %s %s %s %s",
  kvm_opts ? kvm_opts : "",
  machine, machine_opts,
  memory_size, tmpfs,
@@ -806,6 +819,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  arch_source ? arch_source : "",
  shmem_opts ? shmem_opts : "",
  args->opts_source ? args->opts_source : "",
+ device_opts ? device_opts : "",
  ignore_stderr);
 if (!args->only_target) {
 *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
@@ -820,7 +834,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "-m %s "
  "-serial file:%s/dest_serial "
  "-incoming %s "
- "%s %s %s %s %s",
+ "%s %s %s %s %s %s",
  kvm_opts ? kvm_opts : "",
  machine, machine_opts,
  memory_size, tmpfs, uri,
@@ 

[RFC PATCH 0/4] migration-test: Device migration smoke tests

2024-05-23 Thread Fabiano Rosas
We have discussed recently about two relatively cheap ways to catch
migration compatibility breakages across QEMU versions. This series
adds support for both.

1) vmstate-static-checker.py

This script has existed for a while and takes a dmup of vmstates from
two different QEMU versions and compares them.

The migration maintainer will run this before merges, but it's useless
for bugs that don't enter via the migration tree. I'm adding this test
to the CI for everyone.

Cons: the script can't handle renames and other compatible changes
that might happen to the vmstate structures without modification, so
these kinds of changes would fail the CI job during that release until
the script is fixed or the old QEMU version catches up. I think this
is passable because the CI job is already marked as allow_failure.

2) migration-tests with -device

We never ran the migration-tests with devices added to the QEMU
command line because the migration-tests run custom guest
code. However, just having the device in the command line already
causes it's state to be sent around and this has been shown to catch
bugs[1].

I'm adding support for running any migration-test with a list of
devices, either a hardcoded one provided by us, or a custom one
provided by whoever is experimenting with this code. This also give us
the ability to quickly check what happens when a new (to the tests)
device is added.

1- https://lore.kernel.org/r/87wmo5l58z@suse.de

Fabiano Rosas (4):
  tests/qtest/libqtest: Introduce another qtest_init version with no
handshake
  tests/qtest/migration: Add a test that runs vmstate-static-checker
  tests/qtest/migration: Add support for simple device tests
  ci: Add the new migration device tests

 .gitlab-ci.d/buildtest.yml   |  43 ---
 tests/qtest/libqtest.c   |  14 +++--
 tests/qtest/libqtest.h   |  13 +
 tests/qtest/migration-test.c | 101 ++-
 4 files changed, 157 insertions(+), 14 deletions(-)


base-commit: 01782d6b294f95bcde334386f0aaac593cd28c0d
-- 
2.35.3




[RFC PATCH 2/4] tests/qtest/migration: Add a test that runs vmstate-static-checker

2024-05-23 Thread Fabiano Rosas
We have the vmstate-static-checker script that takes the output of:
'$QEMU -M $machine -dump-vmstate' for two different QEMU versions and
compares them to check for compatibility breakages. This is just too
simple and useful for us to pass on it. Add a test that runs the
script.

Since this needs to use two different QEMU versions, the test is
skipped if only one QEMU is provided. The infrastructure for passing
more than one binary is already in place:

$ PYTHON=$(which python3.11) \
 QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-x86_64 \
 QTEST_QEMU_BINARY=./qemu-system-x86_64 \
 ./tests/qtest/migration-test -p /x86_64/migration/vmstate-checker-script

Signed-off-by: Fabiano Rosas 
---
some code duplication for now, just so we can reason about this
without too much noise
---
 tests/qtest/migration-test.c | 82 
 1 file changed, 82 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e8d3555f56..2253e0fc5b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -63,6 +63,7 @@ static QTestMigrationState dst_state;
 #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
 #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
+#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
 
 #define QEMU_VM_FILE_MAGIC 0x5145564d
 #define FILE_TEST_FILENAME "migfile"
@@ -1611,6 +1612,85 @@ static void test_analyze_script(void)
 test_migrate_end(from, to, false);
 cleanup("migfile");
 }
+
+static void test_vmstate_checker_script(void)
+{
+g_autofree gchar *cmd_src = NULL;
+g_autofree gchar *cmd_dst = NULL;
+g_autofree gchar *vmstate_src = NULL;
+g_autofree gchar *vmstate_dst = NULL;
+const char *machine_alias, *machine_opts = "";
+g_autofree char *machine = NULL;
+const char *arch = qtest_get_arch();
+int pid, wstatus;
+const char *python = g_getenv("PYTHON");
+
+if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
+g_test_skip("Test needs two different QEMU versions");
+return;
+}
+
+if (!python) {
+g_test_skip("PYTHON variable not set");
+return;
+}
+
+if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
+if (g_str_equal(arch, "i386")) {
+machine_alias = "pc";
+} else {
+machine_alias = "q35";
+}
+} else if (g_str_equal(arch, "s390x")) {
+machine_alias = "s390-ccw-virtio";
+} else if (strcmp(arch, "ppc64") == 0) {
+machine_alias = "pseries";
+} else if (strcmp(arch, "aarch64") == 0) {
+machine_alias = "virt";
+} else {
+g_assert_not_reached();
+}
+
+if (!qtest_has_machine(machine_alias)) {
+g_autofree char *msg = g_strdup_printf("machine %s not supported", 
machine_alias);
+g_test_skip(msg);
+return;
+}
+
+machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
+  QEMU_ENV_DST);
+
+vmstate_src = g_strdup_printf("%s/vmstate-src", tmpfs);
+vmstate_dst = g_strdup_printf("%s/vmstate-dst", tmpfs);
+
+cmd_dst = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
+  machine, machine_opts, vmstate_dst);
+cmd_src = g_strdup_printf("-machine %s,%s -dump-vmstate %s",
+  machine, machine_opts, vmstate_src);
+
+qtest_init_with_env_no_handshake(QEMU_ENV_SRC, cmd_src);
+qtest_init_with_env_no_handshake(QEMU_ENV_DST, cmd_dst);
+
+pid = fork();
+if (!pid) {
+close(1);
+open("/dev/null", O_WRONLY);
+execl(python, python, VMSTATE_CHECKER_SCRIPT,
+  "-s", vmstate_src,
+  "-d", vmstate_dst,
+  NULL);
+g_assert_not_reached();
+}
+
+g_assert(waitpid(pid, , 0) == pid);
+if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
+g_test_message("Failed to run vmstate-static-checker.py");
+g_test_fail();
+}
+
+cleanup("vmstate-src");
+cleanup("vmstate-dst");
+}
 #endif
 
 static void test_precopy_common(MigrateCommon *args)
@@ -3495,6 +3575,8 @@ int main(int argc, char **argv)
 #ifndef _WIN32
 if (!g_str_equal(arch, "s390x")) {
 migration_test_add("/migration/analyze-script", test_analyze_script);
+migration_test_add("/migration/vmstate-checker-script",
+   test_vmstate_checker_script);
 }
 #endif
 migration_test_add("/migration/precopy/unix/plain",
-- 
2.35.3




[RFC PATCH 1/4] tests/qtest/libqtest: Introduce another qtest_init version with no handshake

2024-05-23 Thread Fabiano Rosas
Introduce a qtest_init version that does not go through the QMP
handshake, but does pass the test binary environment variables
forward. This is needed so we can run a simpler instance of QEMU with
-machine, but not much else.

The existing qtest_init_without_qmp_handshake() is not enough because
this time we want to pass along the special QTEST_QEMU_BINARY_SRC|DST
environment variables.

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/libqtest.c | 14 +-
 tests/qtest/libqtest.h | 13 +
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index d8f80d335e..911e45e189 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -513,11 +513,6 @@ static QTestState *qtest_init_internal(const char 
*qemu_bin,
 kill(s->qemu_pid, SIGSTOP);
 }
 #endif
-
-/* ask endianness of the target */
-
-s->big_endian = qtest_query_target_endianness(s);
-
 return s;
 }
 
@@ -526,11 +521,20 @@ QTestState *qtest_init_without_qmp_handshake(const char 
*extra_args)
 return qtest_init_internal(qtest_qemu_binary(NULL), extra_args);
 }
 
+QTestState *qtest_init_with_env_no_handshake(const char *var,
+ const char *extra_args)
+{
+return qtest_init_internal(qtest_qemu_binary(var), extra_args);
+}
+
 QTestState *qtest_init_with_env(const char *var, const char *extra_args)
 {
 QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args);
 QDict *greeting;
 
+/* ask endianness of the target */
+s->big_endian = qtest_query_target_endianness(s);
+
 /* Read the QMP greeting and then do the handshake */
 greeting = qtest_qmp_receive(s);
 qobject_unref(greeting);
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index 6e3d3525bf..5e5554b5ad 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -68,6 +68,19 @@ QTestState *qtest_init(const char *extra_args);
  */
 QTestState *qtest_init_with_env(const char *var, const char *extra_args);
 
+/**
+ * qtest_init_with_env_no_handshake:
+ * @var: Environment variable from where to take the QEMU binary
+ * @extra_args: Other arguments to pass to QEMU.  CAUTION: these
+ * arguments are subject to word splitting and shell evaluation.
+ *
+ * Like qtest_init_with_env(), but skip the qmp handshake.
+ *
+ * Returns: #QTestState instance.
+ */
+QTestState *qtest_init_with_env_no_handshake(const char *var,
+ const char *extra_args);
+
 /**
  * qtest_init_without_qmp_handshake:
  * @extra_args: other arguments to pass to QEMU.  CAUTION: these
-- 
2.35.3




[RFC PATCH 4/4] ci: Add the new migration device tests

2024-05-23 Thread Fabiano Rosas
We have two new migration tests that check cross version
compatibility. One uses the vmstate-static-checker.py script to
compare the vmstate structures from two different QEMU versions. The
other runs a simple migration with a few devices present in the VM, to
catch obvious breakages.

Add both tests to the migration-compat-common job.

Signed-off-by: Fabiano Rosas 
---
 .gitlab-ci.d/buildtest.yml | 43 +++---
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 91c57efded..bc7ac35983 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -202,18 +202,47 @@ build-previous-qemu:
   needs:
 - job: build-previous-qemu
 - job: build-system-opensuse
-  # The old QEMU could have bugs unrelated to migration that are
-  # already fixed in the current development branch, so this test
-  # might fail.
+  # This test is allowed to fail because:
+  #
+  # - The old QEMU could have bugs unrelated to migration that are
+  #   already fixed in the current development branch.
+  #
+  # - The vmstate-static-checker script trips on renames and other
+  #   backward-compatible changes to the vmstate structs.
   allow_failure: true
   variables:
 IMAGE: opensuse-leap
 MAKE_CHECK_ARGS: check-build
   script:
-# Use the migration-tests from the older QEMU tree. This avoids
-# testing an old QEMU against new features/tests that it is not
-# compatible with.
-- cd build-previous
+- cd build
+# device state static test: Tests the vmstate structures for
+# compatibility across QEMU versions. Uses the latest version of
+# the tests.
+# old to new
+- PYTHON=pyvenv/bin/python3
+  QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET}
+  ./tests/qtest/migration-test -p 
/${TARGET}/migration/vmstate-checker-script
+# new to old skipped because vmstate version bumps are always
+# backward incompatible.
+
+# device state runtime test: Performs a cross-version migration
+# with a select list of devices (see DEFAULT_DEVICES in
+# migration-test.c). Using the multifd tcp test here, but any will
+# do.
+# old to new
+- QTEST_DEVICE_OPTS=all 
QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
./tests/qtest/migration-test
+  -p /${TARGET}/migration/multifd/tcp/channels/plain/none
+# new to old
+- QTEST_DEVICE_OPTS=all 
QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
./tests/qtest/migration-test
+  -p /${TARGET}/migration/multifd/tcp/channels/plain/none
+
+# migration core tests: Use the migration-tests from the older
+# QEMU tree. This avoids testing an old QEMU against new
+# features/tests that it is not compatible with.
+- cd ../build-previous
 # old to new
 - QTEST_QEMU_BINARY_SRC=./qemu-system-${TARGET}
   QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} 
./tests/qtest/migration-test
-- 
2.35.3




[PATCH v2 03/18] tests/qtest/migration: Add a precopy file test with fdset

2024-05-23 Thread Fabiano Rosas
Add a test for file migration using fdset. The passing of fds is more
complex than using a file path. This is also the scenario where it's
most important we ensure that the initial migration stream offset is
respected because the fdset interface is the one used by the
management layer when providing a non empty migration file.

Note that fd passing is not available on Windows, so anything that
uses add-fd needs to exclude that platform.

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index ec905543cf..5bb53f4839 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2013,6 +2013,46 @@ static void test_precopy_file(void)
 test_file_common(, true);
 }
 
+#ifndef _WIN32
+static void fdset_add_fds(QTestState *qts, const char *file, int flags,
+  int num_fds)
+{
+for (int i = 0; i < num_fds; i++) {
+int fd;
+
+fd = open(file, flags, 0660);
+assert(fd != -1);
+
+qtest_qmp_fds_assert_success(qts, , 1, "{'execute': 'add-fd', "
+ "'arguments': {'fdset-id': 1}}");
+close(fd);
+}
+}
+
+static void *file_offset_fdset_start_hook(QTestState *from, QTestState *to)
+{
+g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+
+fdset_add_fds(from, file, O_WRONLY, 1);
+fdset_add_fds(to, file, O_RDONLY, 1);
+
+return NULL;
+}
+
+static void test_precopy_file_offset_fdset(void)
+{
+g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = file_offset_fdset_start_hook,
+};
+
+test_file_common(, false);
+}
+#endif
+
 static void test_precopy_file_offset(void)
 {
 g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
@@ -3521,6 +3561,10 @@ int main(int argc, char **argv)
test_precopy_file);
 migration_test_add("/migration/precopy/file/offset",
test_precopy_file_offset);
+#ifndef _WIN32
+migration_test_add("/migration/precopy/file/offset/fdset",
+   test_precopy_file_offset_fdset);
+#endif
 migration_test_add("/migration/precopy/file/offset/bad",
test_precopy_file_offset_bad);
 
-- 
2.35.3




[PATCH v2 08/18] monitor: Report errors from monitor_fdset_dup_fd_add

2024-05-23 Thread Fabiano Rosas
I'm keeping the EACCES because callers expect to be able to look at
errno.

Signed-off-by: Fabiano Rosas 
---
 include/monitor/monitor.h |  2 +-
 monitor/fds.c | 10 +-
 stubs/fdset.c |  2 +-
 util/osdep.c  | 10 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index fd9b3f538c..c3740ec616 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -51,7 +51,7 @@ int monitor_read_password(MonitorHMP *mon, ReadLineFunc 
*readline_func,
 
 AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
 const char *opaque, Error **errp);
-int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
+int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, Error **errp);
 void monitor_fdset_dup_fd_remove(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
diff --git a/monitor/fds.c b/monitor/fds.c
index fa723e1883..d93d2e695b 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -404,9 +404,10 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 return fdinfo;
 }
 
-int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
+int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, Error **errp)
 {
 #ifdef _WIN32
+error_setg(errp, "Platform does not support fd passing (fdset)");
 return -ENOENT;
 #else
 MonFdset *mon_fdset;
@@ -426,6 +427,8 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 QLIST_FOREACH(mon_fdset_fd, _fdset->fds, next) {
 mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
 if (mon_fd_flags == -1) {
+error_setg(errp, "Failed to read file status flags for fd=%d",
+   mon_fdset_fd->fd);
 return -1;
 }
 
@@ -437,11 +440,15 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 
 if (fd == -1) {
 errno = EACCES;
+error_setg(errp,
+   "Failed to find file descriptor with matching 
flags=0x%x",
+   flags);
 return -1;
 }
 
 dup_fd = qemu_dup_flags(fd, flags);
 if (dup_fd == -1) {
+error_setg(errp, "Failed to dup() given file descriptor fd=%d", 
fd);
 return -1;
 }
 
@@ -451,6 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 return dup_fd;
 }
 
+error_setg(errp, "Failed to find fdset /dev/fdset/%" PRId64, fdset_id);
 errno = ENOENT;
 return -1;
 #endif
diff --git a/stubs/fdset.c b/stubs/fdset.c
index 389e368a29..2950fd91fd 100644
--- a/stubs/fdset.c
+++ b/stubs/fdset.c
@@ -3,7 +3,7 @@
 #include "monitor/monitor.h"
 #include "../monitor/monitor-internal.h"
 
-int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
+int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, Error **errp)
 {
 errno = ENOSYS;
 return -1;
diff --git a/util/osdep.c b/util/osdep.c
index 2d9749d060..0cb8f98bf6 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -305,7 +305,6 @@ qemu_open_internal(const char *name, int flags, mode_t 
mode, Error **errp)
 /* Attempt dup of fd from fd set */
 if (strstart(name, "/dev/fdset/", _id_str)) {
 int64_t fdset_id;
-int dupfd;
 
 fdset_id = qemu_parse_fdset(fdset_id_str);
 if (fdset_id == -1) {
@@ -314,14 +313,7 @@ qemu_open_internal(const char *name, int flags, mode_t 
mode, Error **errp)
 return -1;
 }
 
-dupfd = monitor_fdset_dup_fd_add(fdset_id, flags);
-if (dupfd == -1) {
-error_setg_errno(errp, errno, "Could not dup FD for %s flags %x",
- name, flags);
-return -1;
-}
-
-return dupfd;
+return monitor_fdset_dup_fd_add(fdset_id, flags, errp);
 }
 #endif
 
-- 
2.35.3




[PATCH v2 12/18] tests/qtest/migration: Add tests for file migration with direct-io

2024-05-23 Thread Fabiano Rosas
The tests are only allowed to run in systems that know about the
O_DIRECT flag and in filesystems which support it.

Note: this also brings back migrate_set_parameter_bool() which went
away when we removed the compression tests. I copied it verbatim.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
- updated the aligment comment

- added an inline version of probe function so were able to call it
  without any ifdef

- simplified the _start function that sets the capability

- used qemu_try_memalign instead of aligned_alloc
---
 tests/qtest/migration-helpers.c | 44 +++
 tests/qtest/migration-helpers.h |  8 +
 tests/qtest/migration-test.c| 62 +
 3 files changed, 114 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index ce6d6615b5..0ac49ceb54 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -18,6 +18,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/cutils.h"
+#include "qemu/memalign.h"
 
 #include "migration-helpers.h"
 
@@ -473,3 +474,46 @@ void migration_test_add(const char *path, void (*fn)(void))
 qtest_add_data_func_full(path, test, migration_test_wrapper,
  migration_test_destroy);
 }
+
+#ifdef O_DIRECT
+/*
+ * Probe for O_DIRECT support on the filesystem. Since this is used
+ * for tests, be conservative, if anything fails, assume it's
+ * unsupported.
+ */
+bool probe_o_direct_support(const char *tmpfs)
+{
+g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
+int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
+void *buf;
+ssize_t ret, len;
+uint64_t offset;
+
+fd = open(filename, flags, 0660);
+if (fd < 0) {
+unlink(filename);
+return false;
+}
+
+/*
+ * Using 1MB alignment as conservative choice to satisfy any
+ * plausible architecture default page size, and/or filesystem
+ * alignment restrictions.
+ */
+len = 0x10;
+offset = 0x10;
+
+buf = qemu_try_memalign(len, len);
+g_assert(buf);
+
+ret = pwrite(fd, buf, len, offset);
+unlink(filename);
+g_free(buf);
+
+if (ret < 0) {
+return false;
+}
+
+return true;
+}
+#endif
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 1339835698..50095fca4a 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -54,5 +54,13 @@ char *find_common_machine_version(const char *mtype, const 
char *var1,
   const char *var2);
 char *resolve_machine_version(const char *alias, const char *var1,
   const char *var2);
+#ifdef O_DIRECT
+bool probe_o_direct_support(const char *tmpfs);
+#else
+static inline bool probe_o_direct_support(const char *tmpfs)
+{
+return false;
+}
+#endif
 void migration_test_add(const char *path, void (*fn)(void));
 #endif /* MIGRATION_HELPERS_H */
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5bb53f4839..37017ff2d5 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -428,6 +428,38 @@ static void migrate_set_parameter_str(QTestState *who, 
const char *parameter,
 migrate_check_parameter_str(who, parameter, value);
 }
 
+static long long migrate_get_parameter_bool(QTestState *who,
+   const char *parameter)
+{
+QDict *rsp;
+int result;
+
+rsp = qtest_qmp_assert_success_ref(
+who, "{ 'execute': 'query-migrate-parameters' }");
+result = qdict_get_bool(rsp, parameter);
+qobject_unref(rsp);
+return !!result;
+}
+
+static void migrate_check_parameter_bool(QTestState *who, const char 
*parameter,
+int value)
+{
+int result;
+
+result = migrate_get_parameter_bool(who, parameter);
+g_assert_cmpint(result, ==, value);
+}
+
+static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
+  int value)
+{
+qtest_qmp_assert_success(who,
+ "{ 'execute': 'migrate-set-parameters',"
+ "'arguments': { %s: %i } }",
+ parameter, value);
+migrate_check_parameter_bool(who, parameter, value);
+}
+
 static void migrate_ensure_non_converge(QTestState *who)
 {
 /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
@@ -2178,6 +2210,33 @@ static void test_multifd_file_mapped_ram(void)
 test_file_common(, true);
 }
 
+static void *multifd_mapped_ram_dio_start(QTestState *from, QTestState *to)
+{
+migrate_multifd_mapped_ram_start(from, to);
+
+migrate_set_parameter_bool(from, "direct-io", true);
+migrate_set_parameter_bool(to, "direct-i

[PATCH v2 07/18] monitor: Simplify fdset and fd removal

2024-05-23 Thread Fabiano Rosas
Remove fds right away instead of setting the ->removed flag. We don't
need the extra complexity of having a cleanup function reap the
removed entries at a later time.

Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index f7b98814fa..fa723e1883 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -43,7 +43,6 @@ struct mon_fd_t {
 typedef struct MonFdsetFd MonFdsetFd;
 struct MonFdsetFd {
 int fd;
-bool removed;
 char *opaque;
 QLIST_ENTRY(MonFdsetFd) next;
 };
@@ -188,20 +187,6 @@ static void monitor_fdset_fd_free(MonFdsetFd *mon_fdset_fd)
 g_free(mon_fdset_fd);
 }
 
-static void monitor_fdset_cleanup(MonFdset *mon_fdset)
-{
-MonFdsetFd *mon_fdset_fd;
-MonFdsetFd *mon_fdset_fd_next;
-
-QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next, mon_fdset_fd_next) 
{
-if (mon_fdset_fd->removed) {
-monitor_fdset_fd_free(mon_fdset_fd);
-}
-}
-
-monitor_fdset_free(mon_fdset);
-}
-
 void monitor_fdsets_cleanup(void)
 {
 MonFdset *mon_fdset;
@@ -276,7 +261,7 @@ void qmp_get_win32_socket(const char *infos, const char 
*fdname, Error **errp)
 void qmp_remove_fd(int64_t fdset_id, bool has_fd, int64_t fd, Error **errp)
 {
 MonFdset *mon_fdset;
-MonFdsetFd *mon_fdset_fd;
+MonFdsetFd *mon_fdset_fd, *mon_fdset_fd_next;
 char fd_str[60];
 
 QEMU_LOCK_GUARD(_fdsets_lock);
@@ -284,21 +269,22 @@ void qmp_remove_fd(int64_t fdset_id, bool has_fd, int64_t 
fd, Error **errp)
 if (mon_fdset->id != fdset_id) {
 continue;
 }
-QLIST_FOREACH(mon_fdset_fd, _fdset->fds, next) {
+QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next,
+   mon_fdset_fd_next) {
 if (has_fd) {
 if (mon_fdset_fd->fd != fd) {
 continue;
 }
-mon_fdset_fd->removed = true;
+monitor_fdset_fd_free(mon_fdset_fd);
 break;
 } else {
-mon_fdset_fd->removed = true;
+monitor_fdset_fd_free(mon_fdset_fd);
 }
 }
 if (has_fd && !mon_fdset_fd) {
 goto error;
 }
-monitor_fdset_cleanup(mon_fdset);
+monitor_fdset_free(mon_fdset);
 return;
 }
 
@@ -408,7 +394,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 
 mon_fdset_fd = g_malloc0(sizeof(*mon_fdset_fd));
 mon_fdset_fd->fd = fd;
-mon_fdset_fd->removed = false;
 mon_fdset_fd->opaque = g_strdup(opaque);
 QLIST_INSERT_HEAD(_fdset->fds, mon_fdset_fd, next);
 
-- 
2.35.3




[PATCH v2 09/18] io: Stop using qemu_open_old in channel-file

2024-05-23 Thread Fabiano Rosas
We want to make use of the Error object to report fdset errors from
qemu_open_internal() and passing the error pointer to qemu_open_old()
would require changing all callers. Move the file channel to the new
API instead.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 io/channel-file.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/io/channel-file.c b/io/channel-file.c
index 6436cfb6ae..2ea8d08360 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path,
 
 ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE));
 
-ioc->fd = qemu_open_old(path, flags, mode);
+if (flags & O_CREAT) {
+ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp);
+} else {
+ioc->fd = qemu_open(path, flags, errp);
+}
 if (ioc->fd < 0) {
 object_unref(OBJECT(ioc));
-error_setg_errno(errp, errno,
- "Unable to open %s", path);
 return NULL;
 }
 
-- 
2.35.3




[PATCH v2 06/18] monitor: Stop removing non-duplicated fds

2024-05-23 Thread Fabiano Rosas
We've been up until now cleaning up any file descriptors that have
been passed into QEMU and never duplicated[1,2]. A file descriptor
without duplicates indicates that no part of QEMU has made use of
it. This approach is starting to show some cracks now that we're
starting to consume fds from the migration code:

- Doing cleanup every time the last monitor connection closes works to
  reap unused fds, but also has the side effect of forcing the
  management layer to pass the file descriptors again in case of a
  disconnect/re-connect, if that happened to be the only monitor
  connection.

  Another side effect is that removing an fd with qmp_remove_fd() is
  effectively delayed until the last monitor connection closes.

  The reliance on mon_refcount is also problematic because it's racy.

- Checking runstate_is_running() skips the cleanup unless the VM is
  running and avoids premature cleanup of the fds, but also has the
  side effect of blocking the legitimate removal of an fd via
  qmp_remove_fd() if the VM happens to be in another state.

  This affects qmp_remove_fd() and qmp_query_fdsets() in particular
  because requesting a removal at a bad time (guest stopped) might
  cause an fd to never be removed, or to be removed at a much later
  point in time, causing the query command to continue showing the
  supposedly removed fd/fdset.

Note that file descriptors that *have* been duplicated are owned by
the code that uses them and will be removed after qemu_close() is
called. Therefore we've decided that the best course of action to
avoid the undesired side-effects is to stop managing non-duplicated
file descriptors.

1- efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
2- ebe52b592d ("monitor: Prevent removing fd from set during init")

Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c  | 15 ---
 monitor/hmp.c  |  2 --
 monitor/monitor-internal.h |  1 -
 monitor/qmp.c  |  2 --
 4 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index 101e21720d..f7b98814fa 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -169,6 +169,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname, Error 
**errp)
 
 static void monitor_fdset_free(MonFdset *mon_fdset)
 {
+/*
+ * Only remove an empty fdset. The fds are owned by the user and
+ * should have been removed with qmp_remove_fd(). The dup_fds are
+ * owned by QEMU and should have been removed with qemu_close().
+ */
 if (QLIST_EMPTY(_fdset->fds) && QLIST_EMPTY(_fdset->dup_fds)) {
 QLIST_REMOVE(mon_fdset, next);
 g_free(mon_fdset);
@@ -189,9 +194,7 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
 MonFdsetFd *mon_fdset_fd_next;
 
 QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next, mon_fdset_fd_next) 
{
-if ((mon_fdset_fd->removed ||
-(QLIST_EMPTY(_fdset->dup_fds) && mon_refcount == 0)) &&
-runstate_is_running()) {
+if (mon_fdset_fd->removed) {
 monitor_fdset_fd_free(mon_fdset_fd);
 }
 }
@@ -206,7 +209,7 @@ void monitor_fdsets_cleanup(void)
 
 QEMU_LOCK_GUARD(_fdsets_lock);
 QLIST_FOREACH_SAFE(mon_fdset, _fdsets, next, mon_fdset_next) {
-monitor_fdset_cleanup(mon_fdset);
+monitor_fdset_free(mon_fdset);
 }
 }
 
@@ -479,9 +482,7 @@ void monitor_fdset_dup_fd_remove(int dup_fd)
 if (mon_fdset_fd_dup->fd == dup_fd) {
 QLIST_REMOVE(mon_fdset_fd_dup, next);
 g_free(mon_fdset_fd_dup);
-if (QLIST_EMPTY(_fdset->dup_fds)) {
-monitor_fdset_cleanup(mon_fdset);
-}
+monitor_fdset_free(mon_fdset);
 return;
 }
 }
diff --git a/monitor/hmp.c b/monitor/hmp.c
index 69c1b7e98a..460e8832f6 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1437,11 +1437,9 @@ static void monitor_event(void *opaque, QEMUChrEvent 
event)
 monitor_resume(mon);
 }
 qemu_mutex_unlock(>mon_lock);
-mon_refcount++;
 break;
 
 case CHR_EVENT_CLOSED:
-mon_refcount--;
 monitor_fdsets_cleanup();
 break;
 
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index 252de85681..cb628f681d 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -168,7 +168,6 @@ extern bool qmp_dispatcher_co_shutdown;
 extern QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
 extern QemuMutex monitor_lock;
 extern MonitorList mon_list;
-extern int mon_refcount;
 
 extern HMPCommand hmp_cmds[];
 
diff --git a/monitor/qmp.c b/monitor/qmp.c
index a239945e8d..5e538f34c0 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -466,7 +466,6 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent 
event)
 data = qmp_greeting(mon);
 qmp_send_r

[PATCH v2 04/18] monitor: Drop monitor_fdset_dup_fd_add()

2024-05-23 Thread Fabiano Rosas
From: Peter Xu 

This function is not needed, one remove function should already work.
Clean it up.

Here the code doesn't really care about whether we need to keep that dupfd
around if close() failed: when that happens something got very wrong,
keeping the dup_fd around the fdsets may not help that situation so far.

Cc: Dr. David Alan Gilbert 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Paolo Bonzini 
Cc: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
[add missing return statement, removal during traversal is not safe]
Signed-off-by: Fabiano Rosas 
---
 include/monitor/monitor.h |  1 -
 monitor/fds.c | 28 ++--
 stubs/fdset.c |  5 -
 util/osdep.c  | 15 +--
 4 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 965f5d5450..fd9b3f538c 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 const char *opaque, Error **errp);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
 void monitor_fdset_dup_fd_remove(int dup_fd);
-int64_t monitor_fdset_dup_fd_find(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
   void (*cmd)(Monitor *mon, const QDict *qdict));
diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..fb9f58c056 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 #endif
 }
 
-static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
+void monitor_fdset_dup_fd_remove(int dup_fd)
 {
 MonFdset *mon_fdset;
 MonFdsetFd *mon_fdset_fd_dup;
@@ -467,31 +467,15 @@ static int64_t monitor_fdset_dup_fd_find_remove(int 
dup_fd, bool remove)
 QLIST_FOREACH(mon_fdset, _fdsets, next) {
 QLIST_FOREACH(mon_fdset_fd_dup, _fdset->dup_fds, next) {
 if (mon_fdset_fd_dup->fd == dup_fd) {
-if (remove) {
-QLIST_REMOVE(mon_fdset_fd_dup, next);
-g_free(mon_fdset_fd_dup);
-if (QLIST_EMPTY(_fdset->dup_fds)) {
-monitor_fdset_cleanup(mon_fdset);
-}
-return -1;
-} else {
-return mon_fdset->id;
+QLIST_REMOVE(mon_fdset_fd_dup, next);
+g_free(mon_fdset_fd_dup);
+if (QLIST_EMPTY(_fdset->dup_fds)) {
+monitor_fdset_cleanup(mon_fdset);
 }
+return;
 }
 }
 }
-
-return -1;
-}
-
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-return monitor_fdset_dup_fd_find_remove(dup_fd, false);
-}
-
-void monitor_fdset_dup_fd_remove(int dup_fd)
-{
-monitor_fdset_dup_fd_find_remove(dup_fd, true);
 }
 
 int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
diff --git a/stubs/fdset.c b/stubs/fdset.c
index d7c39a28ac..389e368a29 100644
--- a/stubs/fdset.c
+++ b/stubs/fdset.c
@@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 return -1;
 }
 
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-return -1;
-}
-
 void monitor_fdset_dup_fd_remove(int dupfd)
 {
 }
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..2d9749d060 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
 
 int qemu_close(int fd)
 {
-int64_t fdset_id;
-
 /* Close fd that was dup'd from an fdset */
-fdset_id = monitor_fdset_dup_fd_find(fd);
-if (fdset_id != -1) {
-int ret;
-
-ret = close(fd);
-if (ret == 0) {
-monitor_fdset_dup_fd_remove(fd);
-}
-
-return ret;
-}
-
+monitor_fdset_dup_fd_remove(fd);
 return close(fd);
 }
 
-- 
2.35.3




[PATCH v2 17/18] migration: Add direct-io helpers

2024-05-23 Thread Fabiano Rosas
We support using O_DIRECT with multifd by isolating the main migration
channel which does unaligned IO from the multifd channels that do
aligned IO. When multifd is not enabled, we can still use O_DIRECT, if
we judiciously enable/disable the flag to avoid the unaligned IO.

Add helpers to enable/disable direct-io around the aligned parts.

Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 31 +++
 migration/migration.h |  2 ++
 migration/qemu-file.c | 29 +
 migration/qemu-file.h |  2 +-
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index e03c80b3aa..be128f95da 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -420,6 +420,37 @@ static void migrate_generate_event(int new_state)
 }
 }
 
+bool migration_direct_io_start(QEMUFile *file, Error **errp)
+{
+if (!migrate_direct_io() || migrate_multifd()) {
+return true;
+}
+
+/* flush any potentially unaligned IO before setting O_DIRECT */
+qemu_fflush(file);
+
+if (!qemu_file_set_direct_io(file, true)) {
+error_setg(errp, "Failed to enable direct-io");
+return false;
+}
+
+return true;
+}
+
+bool migration_direct_io_finish(QEMUFile *file, Error **errp)
+{
+if (!migrate_direct_io() || migrate_multifd()) {
+return true;
+}
+
+if (!qemu_file_set_direct_io(file, false)) {
+error_setg(errp, "Failed to disable direct-io");
+return false;
+}
+
+return true;
+}
+
 /*
  * Send a message on the return channel back to the source
  * of the migration.
diff --git a/migration/migration.h b/migration/migration.h
index 6af01362d4..4d808563b5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -536,4 +536,6 @@ int migration_rp_wait(MigrationState *s);
  */
 void migration_rp_kick(MigrationState *s);
 
+bool migration_direct_io_start(QEMUFile *file, Error **errp);
+bool migration_direct_io_finish(QEMUFile *file, Error **errp);
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 9ccbbb0099..4796be918f 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -52,6 +52,11 @@ struct QEMUFile {
 
 int last_error;
 Error *last_error_obj;
+/*
+ * Whether O_DIRECT is in effect. Used to catch code attempting
+ * unaligned IO.
+ */
+bool dio;
 };
 
 /*
@@ -281,6 +286,9 @@ int qemu_fflush(QEMUFile *f)
 }
 if (f->iovcnt > 0) {
 Error *local_error = NULL;
+
+assert(!f->dio);
+
 if (qio_channel_writev_all(f->ioc,
f->iov, f->iovcnt,
_error) < 0) {
@@ -429,6 +437,8 @@ void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, 
size_t size,
 return;
 }
 
+assert(!f->dio);
+
 add_to_iovec(f, buf, size, may_free);
 }
 
@@ -440,6 +450,8 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, 
size_t size)
 return;
 }
 
+assert(!f->dio);
+
 while (size > 0) {
 l = IO_BUF_SIZE - f->buf_index;
 if (l > size) {
@@ -558,6 +570,8 @@ off_t qemu_get_offset(QEMUFile *f)
 
 void qemu_put_byte(QEMUFile *f, int v)
 {
+assert(!f->dio);
+
 if (f->last_error) {
 return;
 }
@@ -865,3 +879,18 @@ int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
 
 return 0;
 }
+
+bool qemu_file_set_direct_io(QEMUFile *f, bool enabled)
+{
+Error *local_err = NULL;
+
+qio_channel_file_set_direct_io(f->ioc, enabled, _err);
+if (local_err) {
+qemu_file_set_error_obj(f, -EINVAL, local_err);
+return false;
+}
+
+f->dio = enabled;
+
+return true;
+}
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 11c2120edd..118853b21c 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -79,5 +79,5 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, 
size_t buflen,
   off_t pos);
 
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
-
+bool qemu_file_set_direct_io(QEMUFile *f, bool enabled);
 #endif
-- 
2.35.3




[PATCH v2 15/18] tests/qtest/migration: Add a test for mapped-ram with passing of fds

2024-05-23 Thread Fabiano Rosas
Add a multifd test for mapped-ram with passing of fds into QEMU. This
is how libvirt will consume the feature.

There are a couple of details to the fdset mechanism:

- multifd needs two distinct file descriptors (not duplicated with
  dup()) so it can enable O_DIRECT only on the channels that do
  aligned IO. The dup() system call creates file descriptors that
  share status flags, of which O_DIRECT is one.

- the open() access mode flags used for the fds passed into QEMU need
  to match the flags QEMU uses to open the file. Currently O_WRONLY
  for src and O_RDONLY for dst.

Note that fdset code goes under _WIN32 because fd passing is not
supported on Windows.

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 101 +--
 1 file changed, 98 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 37017ff2d5..5ced3b90c9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2047,11 +2047,18 @@ static void test_precopy_file(void)
 
 #ifndef _WIN32
 static void fdset_add_fds(QTestState *qts, const char *file, int flags,
-  int num_fds)
+  int num_fds, bool direct_io)
 {
 for (int i = 0; i < num_fds; i++) {
 int fd;
 
+#ifdef O_DIRECT
+/* only secondary channels can use direct-io */
+if (direct_io && i != 0) {
+flags |= O_DIRECT;
+}
+#endif
+
 fd = open(file, flags, 0660);
 assert(fd != -1);
 
@@ -2065,8 +2072,8 @@ static void *file_offset_fdset_start_hook(QTestState 
*from, QTestState *to)
 {
 g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
 
-fdset_add_fds(from, file, O_WRONLY, 1);
-fdset_add_fds(to, file, O_RDONLY, 1);
+fdset_add_fds(from, file, O_WRONLY, 1, false);
+fdset_add_fds(to, file, O_RDONLY, 1, false);
 
 return NULL;
 }
@@ -2238,6 +2245,87 @@ static void test_multifd_file_mapped_ram_dio(void)
 test_file_common(, true);
 }
 
+#ifndef _WIN32
+static void multifd_mapped_ram_fdset_end(QTestState *from, QTestState *to,
+ void *opaque)
+{
+QDict *resp;
+QList *fdsets;
+
+/*
+ * Remove the fdsets after migration, otherwise a second migration
+ * would fail due fdset reuse.
+ */
+qtest_qmp_assert_success(from, "{'execute': 'remove-fd', "
+ "'arguments': { 'fdset-id': 1}}");
+
+resp = qtest_qmp(from, "{'execute': 'query-fdsets', "
+ "'arguments': {}}");
+g_assert(qdict_haskey(resp, "return"));
+fdsets = qdict_get_qlist(resp, "return");
+g_assert(fdsets && qlist_empty(fdsets));
+}
+
+static void *multifd_mapped_ram_fdset_dio(QTestState *from, QTestState *to)
+{
+g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+
+fdset_add_fds(from, file, O_WRONLY, 2, true);
+fdset_add_fds(to, file, O_RDONLY, 2, true);
+
+migrate_multifd_mapped_ram_start(from, to);
+migrate_set_parameter_bool(from, "direct-io", true);
+migrate_set_parameter_bool(to, "direct-io", true);
+
+return NULL;
+}
+
+static void *multifd_mapped_ram_fdset(QTestState *from, QTestState *to)
+{
+g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+
+fdset_add_fds(from, file, O_WRONLY, 2, false);
+fdset_add_fds(to, file, O_RDONLY, 2, false);
+
+migrate_multifd_mapped_ram_start(from, to);
+
+return NULL;
+}
+
+static void test_multifd_file_mapped_ram_fdset(void)
+{
+g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = multifd_mapped_ram_fdset,
+.finish_hook = multifd_mapped_ram_fdset_end,
+};
+
+test_file_common(, true);
+}
+
+static void test_multifd_file_mapped_ram_fdset_dio(void)
+{
+g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = multifd_mapped_ram_fdset_dio,
+.finish_hook = multifd_mapped_ram_fdset_end,
+};
+
+if (!probe_o_direct_support(tmpfs)) {
+g_test_skip("Filesystem does not support O_DIRECT");
+return;
+}
+
+test_file_common(, true);
+}
+#endif /* !_WIN32 */
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -3648,6 +3736,13 @@ int main(int argc, char **argv)
 migration_test_add("/migration/multifd/file/mapped-ram/dio",
test_multifd_fil

[PATCH v2 16/18] io/channel-file: Add direct-io support

2024-05-23 Thread Fabiano Rosas
Add support for setting/clearing the O_DIRECT flag on a file
descriptor. This will be used for enabling O_DIRECT in the main
migration channel when multifd is not in use.

Signed-off-by: Fabiano Rosas 
---
 include/io/channel-file.h |  8 
 io/channel-file.c | 25 +
 2 files changed, 33 insertions(+)

diff --git a/include/io/channel-file.h b/include/io/channel-file.h
index d373a4e44d..ecb4450e8e 100644
--- a/include/io/channel-file.h
+++ b/include/io/channel-file.h
@@ -107,4 +107,12 @@ qio_channel_file_new_path(const char *path,
   mode_t mode,
   Error **errp);
 
+/**
+ * qio_channel_file_set_direct_io:
+ * @ioc: the QIOChannel object
+ * @enabled: the desired state of the O_DIRECT flag
+ * @errp: pointer to initialized error object
+ */
+void qio_channel_file_set_direct_io(QIOChannel *ioc, bool enabled,
+Error **errp);
 #endif /* QIO_CHANNEL_FILE_H */
diff --git a/io/channel-file.c b/io/channel-file.c
index 2ea8d08360..a89cd3a6d5 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -231,6 +231,31 @@ static int qio_channel_file_set_blocking(QIOChannel *ioc,
 #endif
 }
 
+void qio_channel_file_set_direct_io(QIOChannel *ioc, bool enabled, Error 
**errp)
+{
+#ifdef O_DIRECT
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+int flags = fcntl(fioc->fd, F_GETFL);
+
+if (flags == -1) {
+error_setg_errno(errp, errno, "Unable to read file descriptor flags");
+return;
+}
+
+if (enabled) {
+flags |= O_DIRECT;
+} else {
+flags &= ~O_DIRECT;
+}
+
+if (fcntl(fioc->fd, F_SETFL, flags) == -1) {
+error_setg_errno(errp, errno, "Unable to set file descriptor flags");
+return;
+}
+#else
+error_setg(errp, "System does not support O_DIRECT");
+#endif
+}
 
 static off_t qio_channel_file_seek(QIOChannel *ioc,
off_t offset,
-- 
2.35.3




[PATCH v2 11/18] migration/multifd: Add direct-io support

2024-05-23 Thread Fabiano Rosas
When multifd is used along with mapped-ram, we can take benefit of a
filesystem that supports the O_DIRECT flag and perform direct I/O in
the multifd threads. This brings a significant performance improvement
because direct-io writes bypass the page cache which would otherwise
be thrashed by the multifd data which is unlikely to be needed again
in a short period of time.

To be able to use a multifd channel opened with O_DIRECT, we must
ensure that a certain aligment is used. Filesystems usually require a
block-size alignment for direct I/O. The way to achieve this is by
enabling the mapped-ram feature, which already aligns its I/O properly
(see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).

By setting O_DIRECT on the multifd channels, all writes to the same
file descriptor need to be aligned as well, even the ones that come
from outside multifd, such as the QEMUFile I/O from the main migration
code. This makes it impossible to use the same file descriptor for the
QEMUFile and for the multifd channels. The various flags and metadata
written by the main migration code will always be unaligned by virtue
of their small size. To workaround this issue, we'll require a second
file descriptor to be used exclusively for direct I/O.

The second file descriptor can be obtained by QEMU by re-opening the
migration file (already possible), or by being provided by the user or
management application (support to be added in future patches).

Signed-off-by: Fabiano Rosas 
---
 migration/file.c  | 31 ++-
 migration/file.h  |  1 -
 migration/migration.c | 23 +++
 3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index ba5b5c44ff..ac4d206492 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -50,12 +50,31 @@ void file_cleanup_outgoing_migration(void)
 outgoing_args.fname = NULL;
 }
 
+static void file_enable_direct_io(int *flags)
+{
+#ifdef O_DIRECT
+if (migrate_direct_io()) {
+*flags |= O_DIRECT;
+}
+#else
+/* it should have been rejected when setting the parameter */
+g_assert_not_reached();
+#endif
+}
+
 bool file_send_channel_create(gpointer opaque, Error **errp)
 {
 QIOChannelFile *ioc;
 int flags = O_WRONLY;
 bool ret = true;
 
+/*
+ * Attempt to enable O_DIRECT for the secondary channels. These
+ * are used for sending ram pages and writes should be guaranteed
+ * to be aligned to at least page size.
+ */
+file_enable_direct_io();
+
 ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
 if (!ioc) {
 ret = false;
@@ -116,21 +135,23 @@ static gboolean file_accept_incoming_migration(QIOChannel 
*ioc,
 return G_SOURCE_REMOVE;
 }
 
-void file_create_incoming_channels(QIOChannel *ioc, Error **errp)
+static void file_create_incoming_channels(QIOChannel *ioc, char *filename,
+  Error **errp)
 {
-int i, fd, channels = 1;
+int i, channels = 1;
 g_autofree QIOChannel **iocs = NULL;
+int flags = O_RDONLY;
 
 if (migrate_multifd()) {
 channels += migrate_multifd_channels();
+file_enable_direct_io();
 }
 
 iocs = g_new0(QIOChannel *, channels);
-fd = QIO_CHANNEL_FILE(ioc)->fd;
 iocs[0] = ioc;
 
 for (i = 1; i < channels; i++) {
-QIOChannelFile *fioc = qio_channel_file_new_dupfd(fd, errp);
+QIOChannelFile *fioc = qio_channel_file_new_path(filename, flags, 0, 
errp);
 
 if (!fioc) {
 while (i) {
@@ -170,7 +191,7 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 return;
 }
 
-file_create_incoming_channels(QIO_CHANNEL(fioc), errp);
+file_create_incoming_channels(QIO_CHANNEL(fioc), filename, errp);
 }
 
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
diff --git a/migration/file.h b/migration/file.h
index 7699c04677..9f71e87f74 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -20,7 +20,6 @@ void file_start_outgoing_migration(MigrationState *s,
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
 void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
-void file_create_incoming_channels(QIOChannel *ioc, Error **errp);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 int niov, RAMBlock *block, Error **errp);
 int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index e1b269624c..e03c80b3aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
 return migrate_mapped_ram();
 }
 
+static bool migration_needs_extra_fds(void)
+{
+/*
+ * When doing direct-io, multifd requires two different,
+ * non-duplicated file descriptors so we can u

[PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration

2024-05-23 Thread Fabiano Rosas
We've recently added support for direct-io with multifd, which brings
performance benefits, but creates a non-uniform user interface by
coupling direct-io with the multifd capability. This means that users
cannot keep the direct-io flag enabled while disabling multifd.

Libvirt in particular already has support for direct-io and parallel
migration separately from each other, so it would be a regression to
now require both options together. It's relatively simple for QEMU to
add support for direct-io migration without multifd, so let's do this
in order to keep both options decoupled.

We cannot simply enable the O_DIRECT flag, however, because not all IO
performed by the migration thread satisfies the alignment requirements
of O_DIRECT. There are many small read & writes that add headers and
synchronization flags to the stream, which at the moment are required
to always be present.

Fortunately, due to fixed-ram migration there is a discernible moment
where only RAM pages are written to the migration file. Enable
direct-io during that moment.

Signed-off-by: Fabiano Rosas 
---
 migration/ram.c  | 40 
 tests/qtest/migration-test.c | 30 +++
 2 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ceea586b06..5183d1f97c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3131,6 +3131,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 int i;
 int64_t t0;
 int done = 0;
+Error **errp = NULL;
 
 /*
  * We'll take this lock a little bit long, but it's okay for two reasons.
@@ -3154,6 +3155,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 goto out;
 }
 
+if (!migration_direct_io_start(f, errp)) {
+return -errno;
+}
+
 t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 i = 0;
 while ((ret = migration_rate_exceeded(f)) == 0 ||
@@ -3194,6 +3199,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 }
 i++;
 }
+if (!migration_direct_io_finish(f, errp)) {
+return -errno;
+}
 }
 }
 
@@ -3242,7 +3250,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
 RAMState **temp = opaque;
 RAMState *rs = *temp;
-int ret = 0;
+int ret = 0, pages;
+Error **errp = NULL;
 
 rs->last_stage = !migration_in_colo_state();
 
@@ -3257,25 +3266,30 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 return ret;
 }
 
+if (!migration_direct_io_start(f, errp)) {
+return -errno;
+}
+
 /* try transferring iterative blocks of memory */
 
 /* flush all remaining blocks regardless of rate limiting */
 qemu_mutex_lock(>bitmap_mutex);
 while (true) {
-int pages;
-
 pages = ram_find_and_save_block(rs);
-/* no more blocks to sent */
-if (pages == 0) {
+if (pages <= 0) {
 break;
 }
-if (pages < 0) {
-qemu_mutex_unlock(>bitmap_mutex);
-return pages;
-}
 }
 qemu_mutex_unlock(>bitmap_mutex);
 
+if (!migration_direct_io_finish(f, errp)) {
+return -errno;
+}
+
+if (pages < 0) {
+return pages;
+}
+
 ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
@@ -3920,6 +3934,10 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, 
RAMBlock *block,
 void *host;
 size_t read, unread, size;
 
+if (!migration_direct_io_start(f, errp)) {
+return false;
+}
+
 for (set_bit_idx = find_first_bit(bitmap, num_pages);
  set_bit_idx < num_pages;
  set_bit_idx = find_next_bit(bitmap, num_pages, clear_bit_idx + 1)) {
@@ -3955,6 +3973,10 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, 
RAMBlock *block,
 }
 }
 
+if (!migration_direct_io_finish(f, errp)) {
+return false;
+}
+
 return true;
 
 err:
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5ced3b90c9..8c6a122c20 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2245,6 +2245,34 @@ static void test_multifd_file_mapped_ram_dio(void)
 test_file_common(, true);
 }
 
+static void *mapped_ram_dio_start(QTestState *from, QTestState *to)
+{
+migrate_mapped_ram_start(from, to);
+
+migrate_set_parameter_bool(from, "direct-io", true);
+migrate_set_parameter_bool(to, "direct-io", true);
+
+return NULL;
+}
+
+static void test_precopy_file_mapped_ram_dio(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+  

[PATCH v2 05/18] monitor: Introduce monitor_fdset_*free

2024-05-23 Thread Fabiano Rosas
Introduce two new functions to remove and free no longer used fds and
fdsets.

We need those to decouple the remove/free routines from
monitor_fdset_cleanup() which will go away in the next patches.

The new functions:

- monitor_fdset_free() will be used when a monitor connection closes
  and when an fd is removed to cleanup any fdset that is now empty.

- monitor_fdset_fd_free() will be used to remove one or more fds that
  have been explicitly targeted by qmp_remove_fd().

Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index fb9f58c056..101e21720d 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -167,6 +167,22 @@ int monitor_get_fd(Monitor *mon, const char *fdname, Error 
**errp)
 return -1;
 }
 
+static void monitor_fdset_free(MonFdset *mon_fdset)
+{
+if (QLIST_EMPTY(_fdset->fds) && QLIST_EMPTY(_fdset->dup_fds)) {
+QLIST_REMOVE(mon_fdset, next);
+g_free(mon_fdset);
+}
+}
+
+static void monitor_fdset_fd_free(MonFdsetFd *mon_fdset_fd)
+{
+close(mon_fdset_fd->fd);
+g_free(mon_fdset_fd->opaque);
+QLIST_REMOVE(mon_fdset_fd, next);
+g_free(mon_fdset_fd);
+}
+
 static void monitor_fdset_cleanup(MonFdset *mon_fdset)
 {
 MonFdsetFd *mon_fdset_fd;
@@ -176,17 +192,11 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
 if ((mon_fdset_fd->removed ||
 (QLIST_EMPTY(_fdset->dup_fds) && mon_refcount == 0)) &&
 runstate_is_running()) {
-close(mon_fdset_fd->fd);
-g_free(mon_fdset_fd->opaque);
-QLIST_REMOVE(mon_fdset_fd, next);
-g_free(mon_fdset_fd);
+monitor_fdset_fd_free(mon_fdset_fd);
 }
 }
 
-if (QLIST_EMPTY(_fdset->fds) && QLIST_EMPTY(_fdset->dup_fds)) {
-QLIST_REMOVE(mon_fdset, next);
-g_free(mon_fdset);
-}
+monitor_fdset_free(mon_fdset);
 }
 
 void monitor_fdsets_cleanup(void)
-- 
2.35.3




[PATCH v2 02/18] tests/qtest/migration: Fix file migration offset check

2024-05-23 Thread Fabiano Rosas
When doing file migration, QEMU accepts an offset that should be
skipped when writing the migration stream to the file. The purpose of
the offset is to allow the management layer to put its own metadata at
the start of the file.

We have tests for this in migration-test, but only testing that the
migration stream starts at the correct offset and not that it actually
leaves the data intact. Unsurprisingly, there's been a bug in that
area that the tests didn't catch.

Fix the tests to write some data to the offset region and check that
it's actually there after the migration.

While here, switch to using g_get_file_contents() which is more
portable than mmap().

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 79 ++--
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b7e3406471..ec905543cf 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -67,6 +67,7 @@ static QTestMigrationState dst_state;
 #define QEMU_VM_FILE_MAGIC 0x5145564d
 #define FILE_TEST_FILENAME "migfile"
 #define FILE_TEST_OFFSET 0x1000
+#define FILE_TEST_MARKER 'X'
 #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
 #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
 
@@ -1716,10 +1717,43 @@ finish:
 test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
 }
 
+static void file_dirty_offset_region(void)
+{
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+size_t size = FILE_TEST_OFFSET;
+g_autofree char *data = g_new0(char, size);
+
+memset(data, FILE_TEST_MARKER, size);
+g_assert(g_file_set_contents(path, data, size, NULL));
+}
+
+static void file_check_offset_region(void)
+{
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+size_t size = FILE_TEST_OFFSET;
+g_autofree char *expected = g_new0(char, size);
+g_autofree char *actual = NULL;
+uint64_t *stream_start;
+
+/*
+ * Ensure the skipped offset region's data has not been touched
+ * and the migration stream starts at the right place.
+ */
+
+memset(expected, FILE_TEST_MARKER, size);
+
+g_assert(g_file_get_contents(path, , NULL, NULL));
+g_assert(!memcmp(actual, expected, size));
+
+stream_start = (uint64_t *)(actual + size);
+g_assert_cmpint(cpu_to_be64(*stream_start) >> 32, ==, QEMU_VM_FILE_MAGIC);
+}
+
 static void test_file_common(MigrateCommon *args, bool stop_src)
 {
 QTestState *from, *to;
 void *data_hook = NULL;
+bool check_offset = false;
 
 if (test_migrate_start(, , args->listen_uri, >start)) {
 return;
@@ -1732,6 +1766,16 @@ static void test_file_common(MigrateCommon *args, bool 
stop_src)
  */
 g_assert_false(args->live);
 
+if (g_strrstr(args->connect_uri, "offset=")) {
+check_offset = true;
+/*
+ * This comes before the start_hook because it's equivalent to
+ * a management application creating the file and writing to
+ * it so hooks should expect the file to be already present.
+ */
+file_dirty_offset_region();
+}
+
 if (args->start_hook) {
 data_hook = args->start_hook(from, to);
 }
@@ -1766,6 +1810,10 @@ static void test_file_common(MigrateCommon *args, bool 
stop_src)
 
 wait_for_serial("dest_serial");
 
+if (check_offset) {
+file_check_offset_region();
+}
+
 finish:
 if (args->finish_hook) {
 args->finish_hook(from, to, data_hook);
@@ -1965,36 +2013,6 @@ static void test_precopy_file(void)
 test_file_common(, true);
 }
 
-static void file_offset_finish_hook(QTestState *from, QTestState *to,
-void *opaque)
-{
-#if defined(__linux__)
-g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
-size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC);
-uintptr_t *addr, *p;
-int fd;
-
-fd = open(path, O_RDONLY);
-g_assert(fd != -1);
-addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
-g_assert(addr != MAP_FAILED);
-
-/*
- * Ensure the skipped offset contains zeros and the migration
- * stream starts at the right place.
- */
-p = addr;
-while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
-g_assert(*p == 0);
-p++;
-}
-g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
-
-munmap(addr, size);
-close(fd);
-#endif
-}
-
 static void test_precopy_file_offset(void)
 {
 g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
@@ -2003,7 +2021,6 @@ static void test_precopy_file_offset(void)
 MigrateCommon args = {
 .connect_uri = uri,
 .listen_uri = "defer",
-.finish_hook = file_offset_finish_hook,
 };
 
 test_file_common(, false);
-- 
2.35.3




[PATCH v2 00/18] migration/mapped-ram: Add direct-io support

2024-05-23 Thread Fabiano Rosas
Thank you all for the comments in the previous version. I believe we
managed to remove much of the complexity of the fdset handling.

Major changes in this v2 are:

- The rework of fdset to be less weird when removing fds. Now we
  always remove after qmp_remove_fd() and never remove after the
  monitor closes. Also removed the ->removed flag.

- Properly check for direct-io during params_check. I'm not going with
  the meson.build approach because that's kind of novel for migration
  paramters and I don't want to make direct-io a special case. I also
  couldn't figure out a way of selecting on CONFIG_O_DIRECT that would
  not end up removing the direct_io fields from everywhere, forcing us
  to check the CONFIG all over the place.

New in this v2 are two options to make the usage of the feature more
uniform and require less special cases in the management layer:

- Precopy direct-io. When not using multifd, enable and disable
  O_DIRECT around the parts of the code known to be aligned.

- Implemented direct-io for the incoming side.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1302882857

v1:
https://lore.kernel.org/r/20240426142042.14573-1-faro...@suse.de

Hi everyone, here's the rest of the migration "mapped-ram" feature
that didn't get merged for 9.0. This series adds support for direct
I/O, the missing piece to get the desired performance improvements.

There's 3 parts to this:

1- The plumbing for the new "direct-io" migration parameter. With this
   we can already use direct-io with the file transport + multifd +
   mapped-ram. Patches 1-3.

Due to the alignment requirements of O_DIRECT and the fact that
multifd runs the channels in parallel with the migration thread, we
must open the migration file two times, one with O_DIRECT set and
another with it clear.

If the user is not passing in a file name which QEMU can open at will,
we must then require that the user pass the two file descriptors with
the flags already properly set. We'll use the already existing fdset +
QMP add-fd infrastructure for this.

2- Changes to the fdset infrastructure to support O_DIRECT. We need
   those to be able to select from the user-provided fdset the file
   descriptor that contains the O_DIRECT flag. Patches 4-5.

3- Some fdset validation to make sure the two-fds requirement is being
   met. Patches 6-7.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1269352083

Fabiano Rosas (17):
  migration: Fix file migration with fdset
  tests/qtest/migration: Fix file migration offset check
  tests/qtest/migration: Add a precopy file test with fdset
  monitor: Introduce monitor_fdset_*free
  monitor: Stop removing non-duplicated fds
  monitor: Simplify fdset and fd removal
  monitor: Report errors from monitor_fdset_dup_fd_add
  io: Stop using qemu_open_old in channel-file
  migration: Add direct-io parameter
  migration/multifd: Add direct-io support
  tests/qtest/migration: Add tests for file migration with direct-io
  monitor: fdset: Match against O_DIRECT
  migration: Add documentation for fdset with multifd + file
  tests/qtest/migration: Add a test for mapped-ram with passing of fds
  io/channel-file: Add direct-io support
  migration: Add direct-io helpers
  migration/ram: Add direct-io support to precopy file migration

Peter Xu (1):
  monitor: Drop monitor_fdset_dup_fd_add()

 docs/devel/migration/main.rst   |  24 ++-
 docs/devel/migration/mapped-ram.rst |   6 +-
 include/io/channel-file.h   |   8 +
 include/monitor/monitor.h   |   3 +-
 include/qemu/osdep.h|   2 +
 io/channel-file.c   |  33 +++-
 migration/file.c|  42 +++-
 migration/file.h|   1 -
 migration/migration-hmp-cmds.c  |  11 ++
 migration/migration.c   |  54 +
 migration/migration.h   |   2 +
 migration/options.c |  28 +++
 migration/options.h |   1 +
 migration/qemu-file.c   |  29 +++
 migration/qemu-file.h   |   2 +-
 migration/ram.c |  40 +++-
 monitor/fds.c   |  89 -
 monitor/hmp.c   |   2 -
 monitor/monitor-internal.h  |   1 -
 monitor/qmp.c   |   2 -
 qapi/migration.json |  21 +-
 stubs/fdset.c   |   7 +-
 tests/qtest/migration-helpers.c |  44 +
 tests/qtest/migration-helpers.h |   8 +
 tests/qtest/migration-test.c| 296 +---
 util/osdep.c|  34 ++--
 26 files changed, 652 insertions(+), 138 deletions(-)


base-commit: 7e1c0047015ffbd408e1aa4a5ec1abe4751dbf7e
-- 
2.35.3




[PATCH v2 10/18] migration: Add direct-io parameter

2024-05-23 Thread Fabiano Rosas
Add the direct-io migration parameter that tells the migration code to
use O_DIRECT when opening the migration stream file whenever possible.

This is currently only used with the mapped-ram migration that has a
clear window guaranteed to perform aligned writes.

Acked-by: Markus Armbruster 
Signed-off-by: Fabiano Rosas 
---
 include/qemu/osdep.h   |  2 ++
 migration/migration-hmp-cmds.c | 11 +++
 migration/options.c| 28 
 migration/options.h|  1 +
 qapi/migration.json| 21 ++---
 util/osdep.c   |  9 +
 6 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index f61edcfdc2..191916f38e 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
bool exclusive);
 bool qemu_has_ofd_lock(void);
 #endif
 
+bool qemu_has_direct_io(void);
+
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
 #elif defined(WIN64)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 9f0e8029e0..7d608d26e1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -351,6 +351,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %s\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MODE),
 qapi_enum_lookup(_lookup, params->mode));
+
+if (params->has_direct_io) {
+monitor_printf(mon, "%s: %s\n",
+   MigrationParameter_str(
+   MIGRATION_PARAMETER_DIRECT_IO),
+   params->direct_io ? "on" : "off");
+}
 }
 
 qapi_free_MigrationParameters(params);
@@ -624,6 +631,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_mode = true;
 visit_type_MigMode(v, param, >mode, );
 break;
+case MIGRATION_PARAMETER_DIRECT_IO:
+p->has_direct_io = true;
+visit_type_bool(v, param, >direct_io, );
+break;
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index 5ab5b6d85d..6f614761b8 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -702,6 +702,18 @@ bool migrate_cpu_throttle_tailslow(void)
 return s->parameters.cpu_throttle_tailslow;
 }
 
+bool migrate_direct_io(void)
+{
+MigrationState *s = migrate_get_current();
+
+/* For now O_DIRECT is only supported with mapped-ram */
+if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
+return false;
+}
+
+return s->parameters.direct_io;
+}
+
 uint64_t migrate_downtime_limit(void)
 {
 MigrationState *s = migrate_get_current();
@@ -905,6 +917,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->mode = s->parameters.mode;
 params->has_zero_page_detection = true;
 params->zero_page_detection = s->parameters.zero_page_detection;
+params->has_direct_io = true;
+params->direct_io = s->parameters.direct_io;
 
 return params;
 }
@@ -937,6 +951,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_vcpu_dirty_limit = true;
 params->has_mode = true;
 params->has_zero_page_detection = true;
+params->has_direct_io = true;
 }
 
 /*
@@ -1110,6 +1125,11 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 return false;
 }
 
+if (params->has_direct_io && params->direct_io && !qemu_has_direct_io()) {
+error_setg(errp, "No build-time support for direct-io");
+return false;
+}
+
 return true;
 }
 
@@ -1216,6 +1236,10 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_zero_page_detection) {
 dest->zero_page_detection = params->zero_page_detection;
 }
+
+if (params->has_direct_io) {
+dest->direct_io = params->direct_io;
+}
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1341,6 +1365,10 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_zero_page_detection) {
 s->parameters.zero_page_detection = params->zero_page_detection;
 }
+
+if (params->has_direct_io) {
+s->parameters.direct_io = params->direct_io;
+}
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 4b21cc2669..a2397026db 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -69,6 +69,7 @@ uint32_t migrate_checkpoint_delay(void);
 uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_

[PATCH v2 14/18] migration: Add documentation for fdset with multifd + file

2024-05-23 Thread Fabiano Rosas
With the last few changes to the fdset infrastructure, we now allow
multifd to use an fdset when migrating to a file. This is useful for
the scenario where the management layer wants to have control over the
migration file.

By receiving the file descriptors directly, QEMU can delegate some
high level operating system operations to the management layer (such
as mandatory access control). The management layer might also want to
add its own headers before the migration stream.

Document the "file:/dev/fdset/#" syntax for the multifd migration with
mapped-ram. The requirements for the fdset mechanism are:

- the fdset must contain two fds that are not duplicates between
  themselves;

- if direct-io is to be used, exactly one of the fds must have the
  O_DIRECT flag set;

- the file must be opened with WRONLY on the migration source side;

- the file must be opened with RDONLY on the migration destination
  side.

Signed-off-by: Fabiano Rosas 
---
 docs/devel/migration/main.rst   | 24 +++-
 docs/devel/migration/mapped-ram.rst |  6 +-
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 495cdcb112..784c899dca 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -47,11 +47,25 @@ over any transport.
   QEMU interference. Note that QEMU does not flush cached file
   data/metadata at the end of migration.
 
-In addition, support is included for migration using RDMA, which
-transports the page data using ``RDMA``, where the hardware takes care of
-transporting the pages, and the load on the CPU is much lower.  While the
-internals of RDMA migration are a bit different, this isn't really visible
-outside the RAM migration code.
+  The file migration also supports using a file that has already been
+  opened. A set of file descriptors is passed to QEMU via an "fdset"
+  (see add-fd QMP command documentation). This method allows a
+  management application to have control over the migration file
+  opening operation. There are, however, strict requirements to this
+  interface if the multifd capability is enabled:
+
+- the fdset must contain two file descriptors that are not
+  duplicates between themselves;
+- if the direct-io capability is to be used, exactly one of the
+  file descriptors must have the O_DIRECT flag set;
+- the file must be opened with WRONLY on the migration source side
+  and RDONLY on the migration destination side.
+
+- rdma migration: support is included for migration using RDMA, which
+  transports the page data using ``RDMA``, where the hardware takes
+  care of transporting the pages, and the load on the CPU is much
+  lower.  While the internals of RDMA migration are a bit different,
+  this isn't really visible outside the RAM migration code.
 
 All these migration protocols use the same infrastructure to
 save/restore state devices.  This infrastructure is shared with the
diff --git a/docs/devel/migration/mapped-ram.rst 
b/docs/devel/migration/mapped-ram.rst
index fa4cefd9fc..e6505511f0 100644
--- a/docs/devel/migration/mapped-ram.rst
+++ b/docs/devel/migration/mapped-ram.rst
@@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
 sequential stream. Having the pages at fixed offsets also allows the
 usage of O_DIRECT for save/restore of the migration stream as the
 pages are ensured to be written respecting O_DIRECT alignment
-restrictions (direct-io support not yet implemented).
+restrictions.
 
 Usage
 -
@@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
 Mapped-ram migration is best done non-live, i.e. by stopping the VM on
 the source side before migrating.
 
+For best performance enable the ``direct-io`` capability as well:
+
+``migrate_set_capability direct-io on``
+
 Use-cases
 -
 
-- 
2.35.3




[PATCH v2 13/18] monitor: fdset: Match against O_DIRECT

2024-05-23 Thread Fabiano Rosas
We're about to enable the use of O_DIRECT in the migration code and
due to the alignment restrictions imposed by filesystems we need to
make sure the flag is only used when doing aligned IO.

The migration will do parallel IO to different regions of a file, so
we need to use more than one file descriptor. Those cannot be obtained
by duplicating (dup()) since duplicated file descriptors share the
file status flags, including O_DIRECT. If one migration channel does
unaligned IO while another sets O_DIRECT to do aligned IO, the
filesystem would fail the unaligned operation.

The add-fd QMP command along with the fdset code are specifically
designed to allow the user to pass a set of file descriptors with
different access flags into QEMU to be later fetched by code that
needs to alternate between those flags when doing IO.

Extend the fdset matching to behave the same with the O_DIRECT flag.

Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index d93d2e695b..a6580e215e 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -419,6 +419,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, 
Error **errp)
 int fd = -1;
 int dup_fd;
 int mon_fd_flags;
+int mask = O_ACCMODE;
+
+#ifdef O_DIRECT
+mask |= O_DIRECT;
+#endif
 
 if (mon_fdset->id != fdset_id) {
 continue;
@@ -432,7 +437,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, 
Error **errp)
 return -1;
 }
 
-if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
+if ((flags & mask) == (mon_fd_flags & mask)) {
 fd = mon_fdset_fd->fd;
 break;
 }
-- 
2.35.3




[PATCH v2 01/18] migration: Fix file migration with fdset

2024-05-23 Thread Fabiano Rosas
When the "file:" migration support was added we missed the special
case in the qemu_open_old implementation that allows for a particular
file name format to be used to refer to a set of file descriptors that
have been previously provided to QEMU via the add-fd QMP command.

When using this fdset feature, we should not truncate the migration
file because being given an fd means that the management layer is in
control of the file and will likely already have some data written to
it. This is further indicated by the presence of the 'offset'
argument, which indicates the start of the region where QEMU is
allowed to write.

Fix the issue by replacing the O_TRUNC flag on open by an ftruncate
call, which will take the offset into consideration.

Fixes: 385f510df5 ("migration: file URI offset")
Suggested-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 migration/file.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index ab18ba505a..ba5b5c44ff 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -84,12 +84,19 @@ void file_start_outgoing_migration(MigrationState *s,
 
 trace_migration_file_outgoing(filename);
 
-fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
- 0600, errp);
+fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY, 0600, errp);
 if (!fioc) {
 return;
 }
 
+if (ftruncate(fioc->fd, offset)) {
+error_setg_errno(errp, errno,
+ "failed to truncate migration file to offset %" 
PRIx64,
+ offset);
+object_unref(OBJECT(fioc));
+return;
+}
+
 outgoing_args.fname = g_strdup(filename);
 
 ioc = QIO_CHANNEL(fioc);
-- 
2.35.3




[PULL 6/9] virtio-gpu: fix v2 migration

2024-05-22 Thread Fabiano Rosas
From: Marc-André Lureau 

Commit dfcf74fa ("virtio-gpu: fix scanout migration post-load") broke
forward/backward version migration. Versioning of nested VMSD structures
is not straightforward, as the wire format doesn't have nested
structures versions. Introduce x-scanout-vmstate-version and a field
test to save/load appropriately according to the machine version.

Fixes: dfcf74fa ("virtio-gpu: fix scanout migration post-load")
Signed-off-by: Marc-André Lureau 
Signed-off-by: Peter Xu 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
[fixed long lines]
Signed-off-by: Fabiano Rosas 
---
 hw/core/machine.c  |  1 +
 hw/display/virtio-gpu.c| 30 ++
 include/hw/virtio/virtio-gpu.h |  1 +
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index c7ceb11501..8d6dc69f0e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -42,6 +42,7 @@ GlobalProperty hw_compat_8_2[] = {
 { "migration", "zero-page-detection", "legacy"},
 { TYPE_VIRTIO_IOMMU_PCI, "granule", "4k" },
 { TYPE_VIRTIO_IOMMU_PCI, "aw-bits", "64" },
+{ "virtio-gpu-device", "x-scanout-vmstate-version", "1" },
 };
 const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
 
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index ae831b6b3e..d60b1b2973 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1166,10 +1166,17 @@ static void virtio_gpu_cursor_bh(void *opaque)
 virtio_gpu_handle_cursor(>parent_obj.parent_obj, g->cursor_vq);
 }
 
+static bool scanout_vmstate_after_v2(void *opaque, int version)
+{
+struct VirtIOGPUBase *base = container_of(opaque, VirtIOGPUBase, scanout);
+struct VirtIOGPU *gpu = container_of(base, VirtIOGPU, parent_obj);
+
+return gpu->scanout_vmstate_version >= 2;
+}
+
 static const VMStateDescription vmstate_virtio_gpu_scanout = {
 .name = "virtio-gpu-one-scanout",
-.version_id = 2,
-.minimum_version_id = 1,
+.version_id = 1,
 .fields = (const VMStateField[]) {
 VMSTATE_UINT32(resource_id, struct virtio_gpu_scanout),
 VMSTATE_UINT32(width, struct virtio_gpu_scanout),
@@ -1181,12 +1188,18 @@ static const VMStateDescription 
vmstate_virtio_gpu_scanout = {
 VMSTATE_UINT32(cursor.hot_y, struct virtio_gpu_scanout),
 VMSTATE_UINT32(cursor.pos.x, struct virtio_gpu_scanout),
 VMSTATE_UINT32(cursor.pos.y, struct virtio_gpu_scanout),
-VMSTATE_UINT32_V(fb.format, struct virtio_gpu_scanout, 2),
-VMSTATE_UINT32_V(fb.bytes_pp, struct virtio_gpu_scanout, 2),
-VMSTATE_UINT32_V(fb.width, struct virtio_gpu_scanout, 2),
-VMSTATE_UINT32_V(fb.height, struct virtio_gpu_scanout, 2),
-VMSTATE_UINT32_V(fb.stride, struct virtio_gpu_scanout, 2),
-VMSTATE_UINT32_V(fb.offset, struct virtio_gpu_scanout, 2),
+VMSTATE_UINT32_TEST(fb.format, struct virtio_gpu_scanout,
+scanout_vmstate_after_v2),
+VMSTATE_UINT32_TEST(fb.bytes_pp, struct virtio_gpu_scanout,
+scanout_vmstate_after_v2),
+VMSTATE_UINT32_TEST(fb.width, struct virtio_gpu_scanout,
+scanout_vmstate_after_v2),
+VMSTATE_UINT32_TEST(fb.height, struct virtio_gpu_scanout,
+scanout_vmstate_after_v2),
+VMSTATE_UINT32_TEST(fb.stride, struct virtio_gpu_scanout,
+scanout_vmstate_after_v2),
+VMSTATE_UINT32_TEST(fb.offset, struct virtio_gpu_scanout,
+scanout_vmstate_after_v2),
 VMSTATE_END_OF_LIST()
 },
 };
@@ -1659,6 +1672,7 @@ static Property virtio_gpu_properties[] = {
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
 DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
+DEFINE_PROP_UINT8("x-scanout-vmstate-version", VirtIOGPU, 
scanout_vmstate_version, 2),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 56d6e821bf..7a59379f5a 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -177,6 +177,7 @@ typedef struct VGPUDMABuf {
 struct VirtIOGPU {
 VirtIOGPUBase parent_obj;
 
+uint8_t scanout_vmstate_version;
 uint64_t conf_max_hostmem;
 
 VirtQueue *ctrl_vq;
-- 
2.35.3




[PULL 7/9] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-22 Thread Fabiano Rosas
From: Fiona Ebner 

Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
version 8.1 can fail with:

> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
> kvm: Failed to load virtio-net:virtio
> kvm: error while loading state for instance 0x0 of device 
> ':00:12.0/virtio-net'
> kvm: load of migration failed: Operation not permitted

The series

53da8b5a99 virtio-net: Add support for USO features
9da1684954 virtio-net: Add USO flags to vhost support.
f03e0cf63b tap: Add check for USO features
2ab0ec3121 tap: Add USO support to tap device.

only landed in QEMU 8.2, so the compatibility flags should be part of
machine version 8.1.

Moving the flags unfortunately breaks forward migration with machine
version 8.1 from a binary without this patch to a binary with this
patch.

Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
Signed-off-by: Fiona Ebner 
Reviewed-by: Fabiano Rosas 
Acked-by: Jason Wang 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 hw/core/machine.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 8d6dc69f0e..f5dffea33d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -51,15 +51,15 @@ GlobalProperty hw_compat_8_1[] = {
 { "ramfb", "x-migrate", "off" },
 { "vfio-pci-nohotplug", "x-ramfb-migrate", "off" },
 { "igb", "x-pcie-flr-init", "off" },
+{ TYPE_VIRTIO_NET, "host_uso", "off"},
+{ TYPE_VIRTIO_NET, "guest_uso4", "off"},
+{ TYPE_VIRTIO_NET, "guest_uso6", "off"},
 };
 const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
 
 GlobalProperty hw_compat_8_0[] = {
 { "migration", "multifd-flush-after-each-section", "on"},
 { TYPE_PCI_DEVICE, "x-pcie-ari-nextfn-1", "on" },
-{ TYPE_VIRTIO_NET, "host_uso", "off"},
-{ TYPE_VIRTIO_NET, "guest_uso4", "off"},
-{ TYPE_VIRTIO_NET, "guest_uso6", "off"},
 };
 const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
 
-- 
2.35.3




[PULL 5/9] migration: fix a typo

2024-05-22 Thread Fabiano Rosas
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
Reviewed-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Signed-off-by: Fabiano Rosas 
---
 migration/vmstate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index b51212a75b..ff5d589a6d 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -479,7 +479,7 @@ static int vmstate_subsection_load(QEMUFile *f, const 
VMStateDescription *vmsd,
 
 len = qemu_peek_byte(f, 1);
 if (len < strlen(vmsd->name) + 1) {
-/* subsection name has be be "section_name/a" */
+/* subsection name has to be "section_name/a" */
 trace_vmstate_subsection_load_bad(vmsd->name, "(short)", "");
 return 0;
 }
-- 
2.35.3




[PULL 9/9] tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py

2024-05-22 Thread Fabiano Rosas
From: Thomas Huth 

If analyze-migration.py cannot be run or crashes, the error is currently
ignored since the code only checks for nonzero values in case the child
exited properly. For example, if you run the test with a non-existing
Python interpreter, it still succeeds:

 $ PYTHON=wrongpython QTEST_QEMU_BINARY=./qemu-system-x86_64 
tests/qtest/migration-test
 ...
 # Running /x86_64/migration/analyze-script
 # Using machine type: pc-q35-9.1
 # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-417639.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-417639.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg 
-machine pc-q35-9.1, -name source,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-XPLUN2/src_serial -drive 
if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device 
ide-hd,drive=d0,secs=1,cyls=1,heads=1   -uuid 
----  -accel qtest
 # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-417639.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-417639.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg 
-machine pc-q35-9.1, -name target,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-XPLUN2/dest_serial -incoming tcp:127.0.0.1:0 -drive 
if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device 
ide-hd,drive=d0,secs=1,cyls=1,heads=1 -accel qtest
 **
 ERROR:../../devel/qemu/tests/qtest/migration-test.c:1603:test_analyze_script: 
code should not be reached
 migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: qtest_wait_qemu: 
Assertion `pid == s->qemu_pid' failed.
 migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: qtest_wait_qemu: 
Assertion `pid == s->qemu_pid' failed.
 ok 2 /x86_64/migration/analyze-script
 ...

Let's better fail the test in case the child did not exit properly, too.

Signed-off-by: Thomas Huth 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5b4eca2b20..b7e3406471 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1604,7 +1604,7 @@ static void test_analyze_script(void)
 }
 
 g_assert(waitpid(pid, , 0) == pid);
-if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) {
+if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
 g_test_message("Failed to analyze the migration stream");
 g_test_fail();
 }
-- 
2.35.3




[PULL 1/9] migration/colo: Minor fix for colo error message

2024-05-22 Thread Fabiano Rosas
From: Li Zhijian 

- Explicitly show the missing module name: replication
- Fix capability name to x-colo

Reviewed-by: Peter Xu 
Reviewed-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Suggested-by: Michael Tokarev 
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e88b24f1e6..995f0ca923 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -513,13 +513,13 @@ void migration_incoming_disable_colo(void)
 int migration_incoming_enable_colo(void)
 {
 #ifndef CONFIG_REPLICATION
-error_report("ENABLE_COLO command come in migration stream, but COLO "
- "module is not built in");
+error_report("ENABLE_COLO command come in migration stream, but the "
+ "replication module is not built in");
 return -ENOTSUP;
 #endif
 
 if (!migrate_colo()) {
-error_report("ENABLE_COLO command come in migration stream, but c-colo 
"
+error_report("ENABLE_COLO command come in migration stream, but x-colo 
"
  "capability is not set");
 return -EINVAL;
 }
-- 
2.35.3




[PULL 2/9] migration/colo: make colo_incoming_co() return void

2024-05-22 Thread Fabiano Rosas
From: Li Zhijian 

Currently, it always returns 0, no need to check the return value at all.
In addition, enter colo coroutine only if migration_incoming_colo_enabled()
is true.
Once the destination side enters the COLO* state, the COLO process will
take over the remaining processes until COLO exits.

Cc: Fabiano Rosas 
Reviewed-by: Peter Xu 
Reviewed-by: Zhang Chen 
Signed-off-by: Li Zhijian 
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas 
---
 include/migration/colo.h | 2 +-
 migration/colo-stubs.c   | 3 +--
 migration/colo.c | 9 ++---
 migration/migration.c| 6 +++---
 4 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index eaac07f26d..43222ef5ae 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -49,7 +49,7 @@ void colo_checkpoint_delay_set(void);
  *
  * Called with BQL locked, may temporary release BQL.
  */
-int coroutine_fn colo_incoming_co(void);
+void coroutine_fn colo_incoming_co(void);
 
 void colo_shutdown(void);
 #endif
diff --git a/migration/colo-stubs.c b/migration/colo-stubs.c
index f8c069b739..e22ce65234 100644
--- a/migration/colo-stubs.c
+++ b/migration/colo-stubs.c
@@ -9,9 +9,8 @@ void colo_shutdown(void)
 {
 }
 
-int coroutine_fn colo_incoming_co(void)
+void coroutine_fn colo_incoming_co(void)
 {
-return 0;
 }
 
 void colo_checkpoint_delay_set(void)
diff --git a/migration/colo.c b/migration/colo.c
index e2b450c132..ca37b932ac 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -928,16 +928,13 @@ out:
 return NULL;
 }
 
-int coroutine_fn colo_incoming_co(void)
+void coroutine_fn colo_incoming_co(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 QemuThread th;
 
 assert(bql_locked());
-
-if (!migration_incoming_colo_enabled()) {
-return 0;
-}
+assert(migration_incoming_colo_enabled());
 
 qemu_thread_create(, "COLO incoming", colo_process_incoming_thread,
mis, QEMU_THREAD_JOINABLE);
@@ -953,6 +950,4 @@ int coroutine_fn colo_incoming_co(void)
 
 /* We hold the global BQL, so it is safe here */
 colo_release_ram_cache();
-
-return 0;
 }
diff --git a/migration/migration.c b/migration/migration.c
index 995f0ca923..c004637d29 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -776,9 +776,9 @@ process_incoming_migration_co(void *opaque)
 goto fail;
 }
 
-if (colo_incoming_co() < 0) {
-error_setg(_err, "colo incoming failed");
-goto fail;
+if (migration_incoming_colo_enabled()) {
+/* yield until COLO exit */
+colo_incoming_co();
 }
 
 migration_bh_schedule(process_incoming_migration_bh, mis);
-- 
2.35.3




[PULL 8/9] tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too

2024-05-22 Thread Fabiano Rosas
From: Thomas Huth 

On s390x, we recently had a regression that broke migration / savevm
(see commit bebe9603fc ("hw/intc/s390_flic: Fix crash that occurs when
saving the machine state"). The problem was merged without being noticed
since we currently do not run any migration / savevm related tests on
x86 hosts.
While we currently cannot run all migration tests for the s390x target
on x86 hosts yet (due to some unresolved issues with TCG), we can at
least run some of the non-live tests to avoid such problems in the future.
Thus enable the "analyze-script" and the "bad_dest" tests before checking
for KVM on s390x or ppc64 (this also fixes the problem that the
"analyze-script" test was not run on s390x at all anymore since it got
disabled again by accident in a previous refactoring of the code).

Signed-off-by: Thomas Huth 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 39 ++--
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e8d3555f56..5b4eca2b20 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3437,6 +3437,20 @@ int main(int argc, char **argv)
 arch = qtest_get_arch();
 is_x86 = !strcmp(arch, "i386") || !strcmp(arch, "x86_64");
 
+tmpfs = g_dir_make_tmp("migration-test-XX", );
+if (!tmpfs) {
+g_test_message("Can't create temporary directory in %s: %s",
+   g_get_tmp_dir(), err->message);
+}
+g_assert(tmpfs);
+
+module_call_init(MODULE_INIT_QOM);
+
+migration_test_add("/migration/bad_dest", test_baddest);
+#ifndef _WIN32
+migration_test_add("/migration/analyze-script", test_analyze_script);
+#endif
+
 /*
  * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
  * is touchy due to race conditions on dirty bits (especially on PPC for
@@ -3444,8 +3458,8 @@ int main(int argc, char **argv)
  */
 if (g_str_equal(arch, "ppc64") &&
 (!has_kvm || access("/sys/module/kvm_hv", F_OK))) {
-g_test_message("Skipping test: kvm_hv not available");
-return g_test_run();
+g_test_message("Skipping tests: kvm_hv not available");
+goto test_add_done;
 }
 
 /*
@@ -3453,19 +3467,10 @@ int main(int argc, char **argv)
  * there until the problems are resolved
  */
 if (g_str_equal(arch, "s390x") && !has_kvm) {
-g_test_message("Skipping test: s390x host with KVM is required");
-return g_test_run();
+g_test_message("Skipping tests: s390x host with KVM is required");
+goto test_add_done;
 }
 
-tmpfs = g_dir_make_tmp("migration-test-XX", );
-if (!tmpfs) {
-g_test_message("Can't create temporary directory in %s: %s",
-   g_get_tmp_dir(), err->message);
-}
-g_assert(tmpfs);
-
-module_call_init(MODULE_INIT_QOM);
-
 if (is_x86) {
 migration_test_add("/migration/precopy/unix/suspend/live",
test_precopy_unix_suspend_live);
@@ -3491,12 +3496,6 @@ int main(int argc, char **argv)
 }
 }
 
-migration_test_add("/migration/bad_dest", test_baddest);
-#ifndef _WIN32
-if (!g_str_equal(arch, "s390x")) {
-migration_test_add("/migration/analyze-script", test_analyze_script);
-}
-#endif
 migration_test_add("/migration/precopy/unix/plain",
test_precopy_unix_plain);
 migration_test_add("/migration/precopy/unix/xbzrle",
@@ -3653,6 +3652,8 @@ int main(int argc, char **argv)
test_vcpu_dirty_limit);
 }
 
+test_add_done:
+
 ret = g_test_run();
 
 g_assert_cmpint(ret, ==, 0);
-- 
2.35.3




[PULL 3/9] migration/colo: Tidy up bql_unlock() around bdrv_activate_all()

2024-05-22 Thread Fabiano Rosas
From: Li Zhijian 

Make the code more tight.

Suggested-by: Michael Tokarev 
Reviewed-by: Peter Xu 
Reviewed-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Michael Tokarev 
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas 
---
 migration/colo.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index ca37b932ac..f96c2ee069 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -837,12 +837,11 @@ static void *colo_process_incoming_thread(void *opaque)
 /* Make sure all file formats throw away their mutable metadata */
 bql_lock();
 bdrv_activate_all(_err);
+bql_unlock();
 if (local_err) {
-bql_unlock();
 error_report_err(local_err);
 return NULL;
 }
-bql_unlock();
 
 failover_init_state();
 
-- 
2.35.3




[PULL 0/9] Migration patches for 2024-05-22

2024-05-22 Thread Fabiano Rosas
The following changes since commit 01782d6b294f95bcde334386f0aaac593cd28c0d:

  Merge tag 'hw-misc-20240517' of https://github.com/philmd/qemu into staging 
(2024-05-18 11:49:01 +0200)

are available in the Git repository at:

  https://gitlab.com/farosas/qemu.git tags/migration-20240522-pull-request

for you to fetch changes up to 8f023a0bd946bb0c122543c64fe2b34bad0dd048:

  tests/qtest/migration-test: Fix the check for a successful run of 
analyze-migration.py (2024-05-22 17:34:41 -0300)


Migration pull request

- Li Zhijian's COLO minor fixes
- Marc-André's virtio-gpu fix
- Fiona's virtio-net USO fix
- A couple of migration-test fixes from Thomas



Fiona Ebner (1):
  hw/core/machine: move compatibility flags for VirtIO-net USO to
machine 8.1

Li Zhijian (3):
  migration/colo: Minor fix for colo error message
  migration/colo: make colo_incoming_co() return void
  migration/colo: Tidy up bql_unlock() around bdrv_activate_all()

Marc-André Lureau (3):
  migration: add "exists" info to load-state-field trace
  migration: fix a typo
  virtio-gpu: fix v2 migration

Thomas Huth (2):
  tests/qtest/migration-test: Run some basic tests on s390x and ppc64
with TCG, too
  tests/qtest/migration-test: Fix the check for a successful run of
analyze-migration.py

 hw/core/machine.c  |  7 +++---
 hw/display/virtio-gpu.c| 30 ++---
 include/hw/virtio/virtio-gpu.h |  1 +
 include/migration/colo.h   |  2 +-
 migration/colo-stubs.c |  3 +--
 migration/colo.c   | 12 +++---
 migration/migration.c  | 12 +-
 migration/trace-events |  2 +-
 migration/vmstate.c|  7 +++---
 tests/qtest/migration-test.c   | 41 +-
 10 files changed, 64 insertions(+), 53 deletions(-)

-- 
2.35.3




[PULL 4/9] migration: add "exists" info to load-state-field trace

2024-05-22 Thread Fabiano Rosas
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
Reviewed-by: Peter Xu 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Signed-off-by: Fabiano Rosas 
---
 migration/trace-events | 2 +-
 migration/vmstate.c| 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/trace-events b/migration/trace-events
index d0c44c3853..0b7c3324fb 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -58,7 +58,7 @@ postcopy_page_req_sync(void *host_addr) "sync page req %p"
 vmstate_load_field_error(const char *field, int ret) "field \"%s\" load 
failed, ret = %d"
 vmstate_load_state(const char *name, int version_id) "%s v%d"
 vmstate_load_state_end(const char *name, const char *reason, int val) "%s 
%s/%d"
-vmstate_load_state_field(const char *name, const char *field) "%s:%s"
+vmstate_load_state_field(const char *name, const char *field, bool exists) 
"%s:%s exists=%d"
 vmstate_n_elems(const char *name, int n_elems) "%s: %d"
 vmstate_subsection_load(const char *parent) "%s"
 vmstate_subsection_load_bad(const char *parent,  const char *sub, const char 
*sub2) "%s: %s/%s"
diff --git a/migration/vmstate.c b/migration/vmstate.c
index ef26f26ccd..b51212a75b 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -128,8 +128,9 @@ int vmstate_load_state(QEMUFile *f, const 
VMStateDescription *vmsd,
 }
 }
 while (field->name) {
-trace_vmstate_load_state_field(vmsd->name, field->name);
-if (vmstate_field_exists(vmsd, field, opaque, version_id)) {
+bool exists = vmstate_field_exists(vmsd, field, opaque, version_id);
+trace_vmstate_load_state_field(vmsd->name, field->name, exists);
+if (exists) {
 void *first_elem = opaque + field->offset;
 int i, n_elems = vmstate_n_elems(opaque, field);
 int size = vmstate_size(opaque, field);
-- 
2.35.3




Re: [PATCH] tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py

2024-05-22 Thread Fabiano Rosas
On Wed, 22 May 2024 11:23:01 +0200, Thomas Huth wrote:
> If analyze-migration.py cannot be run or crashes, the error is currently
> ignored since the code only checks for nonzero values in case the child
> exited properly. For example, if you run the test with a non-existing
> Python interpreter, it still succeeds:
> 
>  $ PYTHON=wrongpython QTEST_QEMU_BINARY=./qemu-system-x86_64 
> tests/qtest/migration-test
>  ...
>  # Running /x86_64/migration/analyze-script
>  # Using machine type: pc-q35-9.1
>  # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-417639.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-417639.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -accel kvm -accel tcg -machine pc-q35-9.1, -name 
> source,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-XPLUN2/src_serial -drive 
> if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=1   -uuid 
> ----  -accel qtest
>  # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-417639.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-417639.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -accel kvm -accel tcg -machine pc-q35-9.1, -name 
> target,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-XPLUN2/dest_serial -incoming tcp:127.0.0.1:0 -drive 
> if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=1 -accel qtest
>  **
>  
> ERROR:../../devel/qemu/tests/qtest/migration-test.c:1603:test_analyze_script: 
> code should not be reached
>  migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: 
> qtest_wait_qemu: Assertion `pid == s->qemu_pid' failed.
>  migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: 
> qtest_wait_qemu: Assertion `pid == s->qemu_pid' failed.
>  ok 2 /x86_64/migration/analyze-script
>  ...
> 
> [...]

Queued, thanks!



Re: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-22 Thread Fabiano Rosas
On Fri, 17 May 2024 09:53:36 +0200, Fiona Ebner wrote:
> Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
> version 8.1 can fail with:
> 
> > kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
> > kvm: Failed to load virtio-net:virtio
> > kvm: error while loading state for instance 0x0 of device 
> > ':00:12.0/virtio-net'
> > kvm: load of migration failed: Operation not permitted
> 
> [...]

Queued, thanks!



Re: [PATCH v4 0/3] Fix "virtio-gpu: fix scanout migration post-load"

2024-05-22 Thread Fabiano Rosas
On Thu, 16 May 2024 12:40:19 +0400, marcandre.lur...@redhat.com wrote:
> The aforementioned patch breaks virtio-gpu device migrations for versions
> pre-9.0/9.0, both forwards and backwards. Versioning of `VMS_STRUCT` is more
> complex than it may initially appear, as evidenced in the problematic commit
> dfcf74fa68c ("virtio-gpu: fix scanout migration post-load").
> 
> v2:
>  - use a manual version field test (instead of the more complex struct 
> variant)
> 
> [...]

Queued, thanks!



Re: [PATCH] tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too

2024-05-22 Thread Fabiano Rosas
On Wed, 22 May 2024 11:12:55 +0200, Thomas Huth wrote:
> On s390x, we recently had a regression that broke migration / savevm
> (see commit bebe9603fc ("hw/intc/s390_flic: Fix crash that occurs when
> saving the machine state"). The problem was merged without being noticed
> since we currently do not run any migration / savevm related tests on
> x86 hosts.
> While we currently cannot run all migration tests for the s390x target
> on x86 hosts yet (due to some unresolved issues with TCG), we can at
> least run some of the non-live tests to avoid such problems in the future.
> Thus enable the "analyze-script" and the "bad_dest" tests before checking
> for KVM on s390x or ppc64 (this also fixes the problem that the
> "analyze-script" test was not run on s390x at all anymore since it got
> disabled again by accident in a previous refactoring of the code).
> 
> [...]

Queued, thanks!



Re: [PATCH v2 1/3] migration/colo: Minor fix for colo error message

2024-05-22 Thread Fabiano Rosas
On Thu, 16 May 2024 11:45:15 +0800, Li Zhijian via wrote:
> - Explicitly show the missing module name: replication
> - Fix capability name to x-colo
> 
> 

Queued, thanks!



Re: [PATCH] tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py

2024-05-22 Thread Fabiano Rosas
Thomas Huth  writes:

> If analyze-migration.py cannot be run or crashes, the error is currently
> ignored since the code only checks for nonzero values in case the child
> exited properly. For example, if you run the test with a non-existing
> Python interpreter, it still succeeds:
>
>  $ PYTHON=wrongpython QTEST_QEMU_BINARY=./qemu-system-x86_64 
> tests/qtest/migration-test
>  ...
>  # Running /x86_64/migration/analyze-script
>  # Using machine type: pc-q35-9.1
>  # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-417639.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-417639.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -accel kvm -accel tcg -machine pc-q35-9.1, -name 
> source,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-XPLUN2/src_serial -drive 
> if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=1   -uuid 
> ----  -accel qtest
>  # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-417639.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-417639.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -accel kvm -accel tcg -machine pc-q35-9.1, -name 
> target,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-XPLUN2/dest_serial -incoming tcp:127.0.0.1:0 -drive 
> if=none,id=d0,file=/tmp/migration-test-XPLUN2/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=1 -accel qtest
>  **
>  
> ERROR:../../devel/qemu/tests/qtest/migration-test.c:1603:test_analyze_script: 
> code should not be reached
>  migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: 
> qtest_wait_qemu: Assertion `pid == s->qemu_pid' failed.
>  migration-test: ../../devel/qemu/tests/qtest/libqtest.c:240: 
> qtest_wait_qemu: Assertion `pid == s->qemu_pid' failed.
>  ok 2 /x86_64/migration/analyze-script
>  ...
>
> Let's better fail the test in case the child did not exit properly, too.
>
> Signed-off-by: Thomas Huth 
> ---
>  tests/qtest/migration-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 5b4eca2b20..b7e3406471 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1604,7 +1604,7 @@ static void test_analyze_script(void)
>  }
>  
>  g_assert(waitpid(pid, , 0) == pid);
> -if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) {
> +if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
>  g_test_message("Failed to analyze the migration stream");
>  g_test_fail();
>  }

Reviewed-by: Fabiano Rosas 



Re: [PATCH] tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too

2024-05-22 Thread Fabiano Rosas
Thomas Huth  writes:

> On s390x, we recently had a regression that broke migration / savevm
> (see commit bebe9603fc ("hw/intc/s390_flic: Fix crash that occurs when
> saving the machine state"). The problem was merged without being noticed
> since we currently do not run any migration / savevm related tests on
> x86 hosts.
> While we currently cannot run all migration tests for the s390x target
> on x86 hosts yet (due to some unresolved issues with TCG), we can at
> least run some of the non-live tests to avoid such problems in the future.
> Thus enable the "analyze-script" and the "bad_dest" tests before checking
> for KVM on s390x or ppc64 (this also fixes the problem that the
> "analyze-script" test was not run on s390x at all anymore since it got
> disabled again by accident in a previous refactoring of the code).
>
> Signed-off-by: Thomas Huth 

Reviewed-by: Fabiano Rosas 

And thanks for fixing my mistake.




Re: [PULL 10/38] tests/qtest/migration: Add a test for the analyze-migration script

2024-05-22 Thread Fabiano Rosas
Thomas Huth  writes:

> On 21/05/2024 14.46, Fabiano Rosas wrote:
>> Alex Bennée  writes:
>> 
>>> Juan Quintela  writes:
>>>
>>>> From: Fabiano Rosas 
>>>>
>>>> Add a smoke test that migrates to a file and gives it to the
>>>> script. It should catch the most annoying errors such as changes in
>>>> the ram flags.
>>>>
>>>> After code has been merged it becomes way harder to figure out what is
>>>> causing the script to fail, the person making the change is the most
>>>> likely to know right away what the problem is.
>>>>
>>>> Signed-off-by: Fabiano Rosas 
>>>> Acked-by: Thomas Huth 
>>>> Reviewed-by: Juan Quintela 
>>>> Signed-off-by: Juan Quintela 
>>>> Message-ID: <20231009184326.15777-7-faro...@suse.de>
>>>
>>> I bisected the failures I'm seeing on s390x to the introduction of this
>>> script. I don't know if its simply a timeout on a relatively slow VM:
>> 
>> What's the range of your bisect? That test has been disabled and then
>> reenabled on s390x. It could be tripping the bisect.
>> 
>> 04131e0009 ("tests/qtest/migration-test: Disable the analyze-migration.py 
>> test on s390x")
>> 81c2c9dd5d ("tests/qtest/migration-test: Fix analyze-migration.py for s390x")
>> 
>> I don't think that test itself could be timing out. It's a very simple
>> test. It runs a migration and then uses the output to validate the
>> script.
>
> Agreed, the analyze-migration.py is unlikely to be the issue - especially 
> since it seems to have been disabled again in commit 6f0771de903b ...
> Fabiano, why did you disable it here again? The reason is not mentioned in 
> the commit description.

Your patch 81c2c9dd5d was merged between my v1 and v2 on the list and I
didn't notice so I messed up the rebase. I'll send a patch soon to fix
that.



Re: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-21 Thread Fabiano Rosas
Fiona Ebner  writes:

> Am 21.05.24 um 00:22 schrieb Fabiano Rosas:
>> Fiona Ebner  writes:
>> 
>>> Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
>>> version 8.1 can fail with:
>>>
>>>> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 
>>>> 0x10179bfffe7
>>>> kvm: Failed to load virtio-net:virtio
>>>> kvm: error while loading state for instance 0x0 of device 
>>>> ':00:12.0/virtio-net'
>>>> kvm: load of migration failed: Operation not permitted
>>>
>>> The series
>>>
>>> 53da8b5a99 virtio-net: Add support for USO features
>>> 9da1684954 virtio-net: Add USO flags to vhost support.
>>> f03e0cf63b tap: Add check for USO features
>>> 2ab0ec3121 tap: Add USO support to tap device.
>>>
>>> only landed in QEMU 8.2, so the compatibility flags should be part of
>>> machine version 8.1.
>>>
>>> Moving the flags unfortunately breaks forward migration with machine
>>> version 8.1 from a binary without this patch to a binary with this
>>> patch.
>>>
>>> Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
>>> Signed-off-by: Fiona Ebner 
>> 
>> Reviewed-by: Fabiano Rosas 
>> 
>> I'll get to it eventually, but is this another one where just having
>> -device virtio-net in the command line when testing cross-version
>> migration would already have caught the issue?
>> 
> AFAIU, the guest kernel needs to be recent enough to support the feature
> too. I don't seem to run into the issue with a Debian 11 (kernel 5.10)
> guest, but I do run into the issue with an Ubuntu 23.10 (kernel 6.5)
> guest. Seems like it got added in kernel 6.2 with 418044e1de30
> ("drivers/net/virtio_net.c: Added USO support.")

Ah ok, so this is more complex, the tests wouldn't have caught it even
with the device options addition. A test for this will have to come at a
second moment once we figure out how to deal with guest-code dependent
issues.



Re: [PULL 10/38] tests/qtest/migration: Add a test for the analyze-migration script

2024-05-21 Thread Fabiano Rosas
Alex Bennée  writes:

> Juan Quintela  writes:
>
>> From: Fabiano Rosas 
>>
>> Add a smoke test that migrates to a file and gives it to the
>> script. It should catch the most annoying errors such as changes in
>> the ram flags.
>>
>> After code has been merged it becomes way harder to figure out what is
>> causing the script to fail, the person making the change is the most
>> likely to know right away what the problem is.
>>
>> Signed-off-by: Fabiano Rosas 
>> Acked-by: Thomas Huth 
>> Reviewed-by: Juan Quintela 
>> Signed-off-by: Juan Quintela 
>> Message-ID: <20231009184326.15777-7-faro...@suse.de>
>
> I bisected the failures I'm seeing on s390x to the introduction of this
> script. I don't know if its simply a timeout on a relatively slow VM:

What's the range of your bisect? That test has been disabled and then
reenabled on s390x. It could be tripping the bisect.

04131e0009 ("tests/qtest/migration-test: Disable the analyze-migration.py test 
on s390x")
81c2c9dd5d ("tests/qtest/migration-test: Fix analyze-migration.py for s390x")

I don't think that test itself could be timing out. It's a very simple
test. It runs a migration and then uses the output to validate the
script.

I don't have a Z machine at hand and the migration tests only run with
KVM for s390x, so it would be useful to take a look at meson's
testlog.txt so we can see which test is failing and hopefully in what
way it is failing.

If you're up for it, running this in a loop is usually the best way to
catch any intermittent issues:

QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test

And once you figure out which test, there's this monstrosity:

QTEST_QEMU_BINARY='gdb -q --ex "set pagination off"  \
  --ex "set print thread-events off" \
  --ex "handle SIGUSR1 noprint"  \
  --ex "handle SIGPIPE noprint"  \
  --ex "run" --ex "quit \$_exitcode" \
  --args ./qemu-system-x86_64'   \
  gdb -q --ex "set prompt (qtest) "  \
  --ex "handle SIGPIPE noprint"  \
  --args ./tests/qtest/migration-test -p 
/x86_64/migration//

> Summary of Failures:
>
>  36/546 qemu:qtest+qtest-s390x / qtest-s390x/migration-test   
> ERROR  93.51s   killed by signal 6 SIGABRT
>
> It seems to be unstable as we pass sometimes:
>
> 11:26:42 [ajb@qemu01:~/l/q/b/system] master|… + ./pyvenv/bin/meson test 
> --repeat 100 qtest-s390x/migration-test
> ninja: Entering directory `/home/ajb/lsrc/qemu.git/builds/system'
> [1/9] Generating qemu-version.h with a custom command (wrapped by meson to 
> capture output)
>   1/100 qemu:qtest+qtest-s390x / qtest-s390x/migration-test  ERROR
>   251.98s   killed by signal 6 SIGABRT
>>>> MALLOC_PERTURB_=9 
>>>> PYTHON=/home/ajb/lsrc/qemu.git/builds/system/pyvenv/bin/python3 
>>>> G_TEST_DBUS_DAEMON=/home/ajb/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh 
>>>> QTEST_QEMU_BINARY=./qemu-system-s390x QTEST_QEMU_IMG=./qemu-img 
>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>>> /home/ajb/lsrc/qemu.git/builds/system/tests/qtest/migration-test --tap -k
>
>   2/100 qemu:qtest+qtest-s390x / qtest-s390x/migration-test  ERROR
>   258.71s   killed by signal 6 SIGABRT
>>>> PYTHON=/home/ajb/lsrc/qemu.git/builds/system/pyvenv/bin/python3 
>>>> MALLOC_PERTURB_=205 
>>>> G_TEST_DBUS_DAEMON=/home/ajb/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh 
>>>> QTEST_QEMU_BINARY=./qemu-system-s390x QTEST_QEMU_IMG=./qemu-img 
>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>>> /home/ajb/lsrc/qemu.git/builds/system/tests/qtest/migration-test --tap -k
>
>   3/100 qemu:qtest+qtest-s390x / qtest-s390x/migration-test  OK   
>   302.53s   46 subtests passed
>   4/100 qemu:qtest+qtest-s390x / qtest-s390x/migration-test  OK   
>   319.56s   46 subtests passed
>   5/100 qemu:qtest+qtest-s390x / qtest-s390x/migration-test  OK   
>   320.11s   46 subtests passed
>   6/100 qemu:qtest+qtest-s390x / qtest-s390x/migration-test  OK   
>   328.40s   46 subtests passed
>
> Ok: 4   
> Expected Fail:  0   
> Fail:   2   
> Unexpected Pass:0   
> Skipped:0   
> Timeout:0   
>
>> ---
>>  tests/qtest/migration-test.c | 60 
>> 

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-20 Thread Fabiano Rosas
Steven Sistare  writes:

> Hi Peter, Hi Fabiano,
>Will you have time to review the migration guts of this series any time 
> soon?
> In particular:
>
> [PATCH V1 05/26] migration: precreate vmstate
> [PATCH V1 06/26] migration: precreate vmstate for exec
> [PATCH V1 12/26] migration: vmstate factory object
> [PATCH V1 18/26] migration: cpr-exec-args parameter
> [PATCH V1 20/26] migration: cpr-exec mode
>

I'll get to them this week. I'm trying to make some progress with my own
code before I forget how to program. I'm also trying to find some time
to implement the device options in the migration tests so we can stop
these virtio-* breakages that have been popping up.



Re: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-20 Thread Fabiano Rosas
Fiona Ebner  writes:

> Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
> version 8.1 can fail with:
>
>> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
>> kvm: Failed to load virtio-net:virtio
>> kvm: error while loading state for instance 0x0 of device 
>> ':00:12.0/virtio-net'
>> kvm: load of migration failed: Operation not permitted
>
> The series
>
> 53da8b5a99 virtio-net: Add support for USO features
> 9da1684954 virtio-net: Add USO flags to vhost support.
> f03e0cf63b tap: Add check for USO features
> 2ab0ec3121 tap: Add USO support to tap device.
>
> only landed in QEMU 8.2, so the compatibility flags should be part of
> machine version 8.1.
>
> Moving the flags unfortunately breaks forward migration with machine
> version 8.1 from a binary without this patch to a binary with this
> patch.
>
> Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
> Signed-off-by: Fiona Ebner 

Reviewed-by: Fabiano Rosas 

I'll get to it eventually, but is this another one where just having
-device virtio-net in the command line when testing cross-version
migration would already have caught the issue?




Re: [PATCH 8/9] migration: Add support for fdset with multifd + file

2024-05-17 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
>> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
>> >> > Allow multifd to use an fdset when migrating to a file. This is useful
>> >> > for the scenario where the management layer wants to have control over
>> >> > the migration file.
>> >> > 
>> >> > By receiving the file descriptors directly, QEMU can delegate some
>> >> > high level operating system operations to the management layer (such
>> >> > as mandatory access control). The management layer might also want to
>> >> > add its own headers before the migration stream.
>> >> > 
>> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
>> >> > mapped-ram. The requirements for the fdset mechanism are:
>> >> > 
>> >> > On the migration source side:
>> >> > 
>> >> > - the fdset must contain two fds that are not duplicates between
>> >> >   themselves;
>> >> > - if direct-io is to be used, exactly one of the fds must have the
>> >> >   O_DIRECT flag set;
>> >> > - the file must be opened with WRONLY both times.
>> >> > 
>> >> > On the migration destination side:
>> >> > 
>> >> > - the fdset must contain one fd;
>> >> > - the file must be opened with RDONLY.
>> >> > 
>> >> > Signed-off-by: Fabiano Rosas 
>> >> > ---
>> >> >  docs/devel/migration/main.rst   | 18 ++
>> >> >  docs/devel/migration/mapped-ram.rst |  6 -
>> >> >  migration/file.c| 38 -
>> >> >  3 files changed, 60 insertions(+), 2 deletions(-)
>> >> > 
>> >> > diff --git a/docs/devel/migration/main.rst 
>> >> > b/docs/devel/migration/main.rst
>> >> > index 54385a23e5..50f6096470 100644
>> >> > --- a/docs/devel/migration/main.rst
>> >> > +++ b/docs/devel/migration/main.rst
>> >> > @@ -47,6 +47,24 @@ over any transport.
>> >> >QEMU interference. Note that QEMU does not flush cached file
>> >> >data/metadata at the end of migration.
>> >> >  
>> >> > +  The file migration also supports using a file that has already been
>> >> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
>> >> > +  (see add-fd QMP command documentation). This method allows a
>> >> > +  management application to have control over the migration file
>> >> > +  opening operation. There are, however, strict requirements to this
>> >> > +  interface:
>> >> > +
>> >> > +  On the migration source side:
>> >> > +- if the multifd capability is to be used, the fdset must contain
>> >> > +  two file descriptors that are not duplicates between themselves;
>> >> > +- if the direct-io capability is to be used, exactly one of the
>> >> > +  file descriptors must have the O_DIRECT flag set;
>> >> > +- the file must be opened with WRONLY.
>> >> > +
>> >> > +  On the migration destination side:
>> >> > +- the fdset must contain one file descriptor;
>> >> > +- the file must be opened with RDONLY.
>> >> > +
>> >> >  In addition, support is included for migration using RDMA, which
>> >> >  transports the page data using ``RDMA``, where the hardware takes care 
>> >> > of
>> >> >  transporting the pages, and the load on the CPU is much lower.  While 
>> >> > the
>> >> > diff --git a/docs/devel/migration/mapped-ram.rst 
>> >> > b/docs/devel/migration/mapped-ram.rst
>> >> > index fa4cefd9fc..e6505511f0 100644
>> >> > --- a/docs/devel/migration/mapped-ram.rst
>> >> > +++ b/docs/devel/migration/mapped-ram.rst
>> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being 
>> >> > added to a
>> >> >  sequential stream. Having the pages at fixed offsets also allows the
>> >> >  usage of O_DIRECT for save/restore of the migration stream as the
>> >>

Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately

2024-05-16 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
>> We're enabling using the fdset interface to pass file descriptors for
>> use in the migration code. Since migrations can happen more than once
>> during the VMs lifetime, we need a way to remove an fd from the fdset
>> at the end of migration.
>> 
>> The current code only removes an fd from the fdset if the VM is
>> running. This causes a QMP call to "remove-fd" to not actually remove
>> the fd if the VM happens to be stopped.
>> 
>> While the fd would eventually be removed when monitor_fdset_cleanup()
>> is called again, the user request should be honored and the fd
>> actually removed. Calling remove-fd + query-fdset shows a recently
>> removed fd still present.
>> 
>> The runstate_is_running() check was introduced by commit ebe52b592d
>> ("monitor: Prevent removing fd from set during init"), which by the
>> shortlog indicates that they were trying to avoid removing an
>> yet-unduplicated fd too early.
>
> IMHO that should be reverted. The justification says
>
>   "If an fd is added to an fd set via the command line, and it is not
>referenced by another command line option (ie. -drive), then clean
>it up after QEMU initialization is complete"
>
> which I think is pretty weak. Why should QEMU forceably stop an app
> from passing in an FD to be used by a QMP command issued just after
> the VM starts running ?  While it could just use QMP to pass in the
> FD set, the mgmt app might have its own reason for wanting QEMU to
> own the passed FD from the very start of the process execve().

I don't think that's what that patch does. That description is
misleading. I read it as:

   "If an fd is added to an fd set via the command line, and it is not
referenced by another command line option (ie. -drive), then clean
it up ONLY after QEMU initialization is complete"
  ^

By the subject ("monitor: Prevent removing fd from set during init") and
the fact that this function is only called when the monitor connection
closes, I believe the idea was to *save* the fds until after the VM
starts running, i.e. some fd was being lost because
monitor_fdset_cleanup() was being called before the dup().

See my reply to Peter in this same patch (PATCH 1/9).

>
> Implicitly this cleanup is attempting to "fix" a bug where the mgmt
> app passes in an FD that it never needed. If any such bug were ever
> found, then the mgmt app should just be fixed to not pass it in. I
> don't think QEMU needs to be trying to fix mgmt app bugs.
>
> IOW, this commit is imposing an arbitrary & unecessary usage policy
> on passed in FD sets, and as your commit explains has further
> unhelpful (& undocumented) side effects on the 'remove-fd' QMP command.
>
> Just revert it IMHO.
>
>> 
>> I don't see why an fd explicitly removed with qmp_remove_fd() should
>> be under runstate_is_running(). I'm assuming this was a mistake when
>> adding the parenthesis around the expression.
>> 
>> Move the runstate_is_running() check to apply only to the
>> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
>> mon_fdset_fd->removed has been explicitly set.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  monitor/fds.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/monitor/fds.c b/monitor/fds.c
>> index d86c2c674c..4ec3b7eea9 100644
>> --- a/monitor/fds.c
>> +++ b/monitor/fds.c
>> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>>  MonFdsetFd *mon_fdset_fd_next;
>>  
>>  QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next, 
>> mon_fdset_fd_next) {
>> -if ((mon_fdset_fd->removed ||
>> -(QLIST_EMPTY(_fdset->dup_fds) && mon_refcount == 0)) &&
>> -runstate_is_running()) {
>> +if (mon_fdset_fd->removed ||
>> +(QLIST_EMPTY(_fdset->dup_fds) && mon_refcount == 0 &&
>> + runstate_is_running())) {
>>  close(mon_fdset_fd->fd);
>>  g_free(mon_fdset_fd->opaque);
>>  QLIST_REMOVE(mon_fdset_fd, next);
>> -- 
>> 2.35.3
>> 
>
> With regards,
> Daniel



Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately

2024-05-16 Thread Fabiano Rosas
Hi all, sorry to have been away from this thread for a while, I was
trying to catch up on my reviews queue.

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
>> We're enabling using the fdset interface to pass file descriptors for
>> use in the migration code. Since migrations can happen more than once
>> during the VMs lifetime, we need a way to remove an fd from the fdset
>> at the end of migration.
>> 
>> The current code only removes an fd from the fdset if the VM is
>> running. This causes a QMP call to "remove-fd" to not actually remove
>> the fd if the VM happens to be stopped.
>> 
>> While the fd would eventually be removed when monitor_fdset_cleanup()
>> is called again, the user request should be honored and the fd
>> actually removed. Calling remove-fd + query-fdset shows a recently
>> removed fd still present.
>> 
>> The runstate_is_running() check was introduced by commit ebe52b592d
>> ("monitor: Prevent removing fd from set during init"), which by the
>> shortlog indicates that they were trying to avoid removing an
>> yet-unduplicated fd too early.
>> 
>> I don't see why an fd explicitly removed with qmp_remove_fd() should
>> be under runstate_is_running(). I'm assuming this was a mistake when
>> adding the parenthesis around the expression.
>> 
>> Move the runstate_is_running() check to apply only to the
>> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
>> mon_fdset_fd->removed has been explicitly set.
>
> I am confused on why the fdset removal is as complicated.  I'm also
> wondering here whether it's dropped because we checked against
> "mon_refcount == 0", and maybe monitor_fdset_cleanup() is simply called
> _before_ a monitor is created?  Why do we need such check on the first
> place?
>

It seems the intention was to reuse monitor_fdset_cleanup() to do
cleanup when all monitors connections are closed:

efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
Author: Corey Bryant 
Date:   Tue Aug 14 16:43:48 2012 -0400

monitor: Clean up fd sets on monitor disconnect

Fd sets are shared by all monitor connections.  Fd sets are considered
to be in use while at least one monitor is connected.  When the last
monitor disconnects, all fds that are members of an fd set with no
outstanding dup references are closed.  This prevents any fd leakage
associated with a client disconnect prior to using a passed fd.

Signed-off-by: Corey Bryant 
Signed-off-by: Kevin Wolf 

This could have been done differently at monitor_qmp_event():

case CHR_EVENT_CLOSED:
...
mon_refcount--;
if (mon_refcount == 0) {
monitor_fdsets_cleanup();
}

But maybe there was a concern about making sure the empty fdsets (last
block in monitor_fdset_cleanup) were removed at every refcount decrement
and not only when mon_refcount == 0 for some reason.

> I'm thinking one case where the only QMP monitor got (for some reason)
> disconnected, and reconnected again during VM running.  Won't current code
> already lead to unwanted removal of mostly all fds due to mon_refcount==0?

I think that's the case that the patch in question was trying to
avoid. If the last monitor connects and disconnects while fds have not
been dup'ed yet, the mon_fdset->dup_fds list will be empty and what you
say will happen. There seems to be an assumption that after the guest
starts running all fds that need to be dup'ed will already have been
dup'ed.

So I think we cannot simply revert the patch as Daniel suggested,
because that might regress the original block layer use-case if a
monitor open->close causes the removal of all the yet undup'ed fds[1].

For the migration use-case, the dup() only happens after the migrate
command has been issued, so the runstate_is_running() check doesn't help
us. But it also doesn't hurt. However, we're still exposed to a monitor
disconnection removing the undup'ed fds. So AFAICS, we either stop
calling monitor_fdset_cleanup() at monitor close or declare that this
issue is unlikely to occur (because it is) and leave a comment in the
code.

===
1- I ran a quick test:

connect() // monitor opened: refcnt: 1

{"execute": "add-fd", "arguments": {"fdset-id": 1}}
{"return": {"fd": 9, "fdset-id": 1}}

{"execute": "add-fd", "arguments": {"fdset-id": 1}}
{"return": {"fd": 21, "fdset-id": 1}}

close()   // monitor closed: refcnt: 0

connect() // monitor opened: refcnt: 1

{"execute": "migrate", "arguments": {"uri": "file:/dev/fdset/1,offset=40

Re: [PATCH 2/3] migration/colo: make colo_incoming_co() return void

2024-05-15 Thread Fabiano Rosas
Li Zhijian via  writes:

> Currently, it always returns 0, no need to check the return value at all.
> In addition, enter colo coroutine only if migration_incoming_colo_enabled()
> is true.
> Once the destination side enters the COLO* state, the COLO process will
> take over the remaining processes until COLO exits.
>
> Signed-off-by: Li Zhijian 
> ---
>  migration/colo.c  | 9 ++---
>  migration/migration.c | 6 +++---
>  2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 5600a43d78..991806c06a 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -929,16 +929,13 @@ out:
>  return NULL;
>  }
>  
> -int coroutine_fn colo_incoming_co(void)
> +void coroutine_fn colo_incoming_co(void)
>  {
>  MigrationIncomingState *mis = migration_incoming_get_current();
>  QemuThread th;
>  
>  assert(bql_locked());
> -
> -if (!migration_incoming_colo_enabled()) {
> -return 0;
> -}
> +assert(migration_incoming_colo_enabled());

FAILED: libcommon.fa.p/migration_colo.c.o
/usr/bin/gcc-13 ... ../migration/colo.c
../migration/colo.c:930:19: error: conflicting types for ‘colo_incoming_co’; 
have ‘void(void)’
  930 | void coroutine_fn colo_incoming_co(void)
  |   ^~~~
In file included from ../migration/colo.c:20:
... qemu/include/migration/colo.h:52:18: note: previous declaration of 
‘colo_incoming_co’ with type ‘int(void)’
   52 | int coroutine_fn colo_incoming_co(void);
  |  ^~~~



Re: [PATCH 4/9] migration: Add direct-io parameter

2024-05-15 Thread Fabiano Rosas
Markus Armbruster  writes:

> Fabiano Rosas  writes:
>
>> Markus Armbruster  writes:
>>
>>> Peter Xu  writes:
>>>
>>>> On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
>>>>> Peter Xu  writes:
>>>>> 
>>>>> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>>>>> >> Add the direct-io migration parameter that tells the migration code to
>>>>> >> use O_DIRECT when opening the migration stream file whenever possible.
>>>>> >> 
>>>>> >> This is currently only used with the mapped-ram migration that has a
>>>>> >> clear window guaranteed to perform aligned writes.
>>>>> >> 
>>>>> >> Acked-by: Markus Armbruster 
>>>>> >> Signed-off-by: Fabiano Rosas 
>>>>> >> ---
>>>>> >>  include/qemu/osdep.h   |  2 ++
>>>>> >>  migration/migration-hmp-cmds.c | 11 +++
>>>>> >>  migration/options.c| 30 ++
>>>>> >>  migration/options.h|  1 +
>>>>> >>  qapi/migration.json| 18 +++---
>>>>> >>  util/osdep.c   |  9 +
>>>>> >>  6 files changed, 68 insertions(+), 3 deletions(-)
>>>>> >> 
>>>>> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>>>>> >> index c7053cdc2b..645c14a65d 100644
>>>>> >> --- a/include/qemu/osdep.h
>>>>> >> +++ b/include/qemu/osdep.h
>>>>> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, 
>>>>> >> int64_t len, bool exclusive);
>>>>> >>  bool qemu_has_ofd_lock(void);
>>>>> >>  #endif
>>>>> >>  
>>>>> >> +bool qemu_has_direct_io(void);
>>>>> >> +
>>>>> >>  #if defined(__HAIKU__) && defined(__i386__)
>>>>> >>  #define FMT_pid "%ld"
>>>>> >>  #elif defined(WIN64)
>>>>> >> diff --git a/migration/migration-hmp-cmds.c 
>>>>> >> b/migration/migration-hmp-cmds.c
>>>>> >> index 7e96ae6ffd..8496a2b34e 100644
>>>>> >> --- a/migration/migration-hmp-cmds.c
>>>>> >> +++ b/migration/migration-hmp-cmds.c
>>>>> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, 
>>>>> >> const QDict *qdict)
>>>>> >>  monitor_printf(mon, "%s: %s\n",
>>>>> >>  MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>>>> >>  qapi_enum_lookup(_lookup, params->mode));
>>>>> >> +
>>>>> >> +if (params->has_direct_io) {
>>>>> >> +monitor_printf(mon, "%s: %s\n",
>>>>> >> +   MigrationParameter_str(
>>>>> >> +   MIGRATION_PARAMETER_DIRECT_IO),
>>>>> >> +   params->direct_io ? "on" : "off");
>>>>> >> +}
>>>>> >
>>>>> > This will be the first parameter to optionally display here.  I think 
>>>>> > it's
>>>>> > a sign of misuse of has_direct_io field..
>>>
>>> Yes.  For similar existing parameters, we do
>>>
>>>assert(params->has_FOO);
>>>monitor_printf(mon, "%s: '%s'\n",
>>>   
>>> MigrationParameter_str(MIGRATION_PARAMETER_FOO),
>>>   ... params->FOO as string ...)
>>>
>>>>> > IMHO has_direct_io should be best to be kept as "whether direct_io 
>>>>> > field is
>>>>> > valid" and that's all of it.  It hopefully shouldn't contain more
>>>>> > information than that, or otherwise it'll be another small challenge we
>>>>> > need to overcome when we can remove all these has_* fields, and can 
>>>>> > also be
>>>>> > easily overlooked.
>>>>> 
>>>>> I don't think I understand why we have those has_* fields. I thought my
>>>>> usage of 'params->has_direct_io 

Re: [PATCH 4/9] migration: Add direct-io parameter

2024-05-14 Thread Fabiano Rosas
Markus Armbruster  writes:

> Peter Xu  writes:
>
>> On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
>>> Peter Xu  writes:
>>> 
>>> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>>> >> Add the direct-io migration parameter that tells the migration code to
>>> >> use O_DIRECT when opening the migration stream file whenever possible.
>>> >> 
>>> >> This is currently only used with the mapped-ram migration that has a
>>> >> clear window guaranteed to perform aligned writes.
>>> >> 
>>> >> Acked-by: Markus Armbruster 
>>> >> Signed-off-by: Fabiano Rosas 
>>> >> ---
>>> >>  include/qemu/osdep.h   |  2 ++
>>> >>  migration/migration-hmp-cmds.c | 11 +++
>>> >>  migration/options.c| 30 ++
>>> >>  migration/options.h|  1 +
>>> >>  qapi/migration.json| 18 +++---
>>> >>  util/osdep.c   |  9 +
>>> >>  6 files changed, 68 insertions(+), 3 deletions(-)
>>> >> 
>>> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>>> >> index c7053cdc2b..645c14a65d 100644
>>> >> --- a/include/qemu/osdep.h
>>> >> +++ b/include/qemu/osdep.h
>>> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t 
>>> >> len, bool exclusive);
>>> >>  bool qemu_has_ofd_lock(void);
>>> >>  #endif
>>> >>  
>>> >> +bool qemu_has_direct_io(void);
>>> >> +
>>> >>  #if defined(__HAIKU__) && defined(__i386__)
>>> >>  #define FMT_pid "%ld"
>>> >>  #elif defined(WIN64)
>>> >> diff --git a/migration/migration-hmp-cmds.c 
>>> >> b/migration/migration-hmp-cmds.c
>>> >> index 7e96ae6ffd..8496a2b34e 100644
>>> >> --- a/migration/migration-hmp-cmds.c
>>> >> +++ b/migration/migration-hmp-cmds.c
>>> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, 
>>> >> const QDict *qdict)
>>> >>  monitor_printf(mon, "%s: %s\n",
>>> >>  MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>> >>  qapi_enum_lookup(_lookup, params->mode));
>>> >> +
>>> >> +if (params->has_direct_io) {
>>> >> +monitor_printf(mon, "%s: %s\n",
>>> >> +   MigrationParameter_str(
>>> >> +   MIGRATION_PARAMETER_DIRECT_IO),
>>> >> +   params->direct_io ? "on" : "off");
>>> >> +}
>>> >
>>> > This will be the first parameter to optionally display here.  I think it's
>>> > a sign of misuse of has_direct_io field..
>
> Yes.  For similar existing parameters, we do
>
>assert(params->has_FOO);
>monitor_printf(mon, "%s: '%s'\n",
>   MigrationParameter_str(MIGRATION_PARAMETER_FOO),
>   ... params->FOO as string ...)
>
>>> > IMHO has_direct_io should be best to be kept as "whether direct_io field 
>>> > is
>>> > valid" and that's all of it.  It hopefully shouldn't contain more
>>> > information than that, or otherwise it'll be another small challenge we
>>> > need to overcome when we can remove all these has_* fields, and can also 
>>> > be
>>> > easily overlooked.
>>> 
>>> I don't think I understand why we have those has_* fields. I thought my
>>> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
>>> one, i.e. checking whether QEMU has any support for that parameter. Can
>>> you help me out here?
>>
>> Here params is the pointer to "struct MigrationParameters", which is
>> defined in qapi/migration.json.  And we have had "has_*" only because we
>> allow optional fields with asterisks:
>>
>>   { 'struct': 'MigrationParameters',
>> 'data': { '*announce-initial': 'size',
>>   ...
>>   } }
>>
>> So that's why it better only means "whether this field existed", because
>> it's how it is defined.
>>
>> IIRC we (or say, Markus) used to have some attempts deduplicates those
>> *MigrationParameter* things, and if success we have chance to drop has_*
>> fields (in which case we simply always have them; that "has_" makes more
>> sense only if in a QMP session to allow user only specify one or more
>> things if not all).
>
> qapi/migration.json:
>
> ##
> # @MigrationParameters:
> #
> --> # The optional members aren't actually optional.
> #
>
> In other words, the has_FOO generated for the members of
> MigrationParameters must all be true.
>
> These members became optional when we attempted to de-duplicate
> MigrationParameters and MigrateSetParameters, but failed (see commit
> de63ab61241 "migrate: Share common MigrationParameters struct" and
> commit 1bda8b3c695 "migration: Unshare MigrationParameters struct for
> now").

So what's the blocker for merging MigrationSetParameters and
MigrationParameters? Just the tls_* options?



RE: [PATCH v6 6/7] migration/multifd: implement qpl compression and decompression

2024-05-14 Thread Fabiano Rosas
"Liu, Yuan1"  writes:

>> -Original Message-
>> From: Fabiano Rosas 
>> Sent: Monday, May 13, 2024 11:14 PM
>> To: Liu, Yuan1 ; pet...@redhat.com
>> Cc: qemu-devel@nongnu.org; Liu, Yuan1 ; Zou, Nanhai
>> 
>> Subject: Re: [PATCH v6 6/7] migration/multifd: implement qpl compression
>> and decompression
>> 
>> Yuan Liu  writes:
>> 
>> > each qpl job is used to (de)compress a normal page and it can
>> > be processed independently by the IAA hardware. All qpl jobs
>> > are submitted to the hardware at once, and wait for all jobs
>> > completion. If hardware path(IAA) is not available, use software
>> > for compression and decompression.
>> >
>> > Signed-off-by: Yuan Liu 
>> > Reviewed-by: Nanhai Zou 
>> > ---
>> >  migration/multifd-qpl.c | 284 +++-
>> >  1 file changed, 280 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
>> > index 89fa51091a..9a1fddbdd0 100644
>> > --- a/migration/multifd-qpl.c
>> > +++ b/migration/multifd-qpl.c
>> > @@ -13,6 +13,7 @@
>> >  #include "qemu/osdep.h"
>> >  #include "qemu/module.h"
>> >  #include "qapi/error.h"
>> > +#include "exec/ramblock.h"
>> >  #include "migration.h"
>> >  #include "multifd.h"
>> >  #include "qpl/qpl.h"
>> > @@ -204,6 +205,139 @@ static void
>> multifd_qpl_send_cleanup(MultiFDSendParams *p, Error **errp)
>> >  p->iov = NULL;
>> >  }
>> >
>> > +/**
>> > + * multifd_qpl_prepare_job: prepare a compression or decompression job
>> > + *
>> > + * Prepare a compression or decompression job and configure job
>> attributes
>> > + * including job compression level and flags.
>> > + *
>> > + * @job: pointer to the QplData structure
>> 
>> qpl_job structure
>
> Thanks for the comment, I will fix this next version.
>
>> > + * @is_compression: compression or decompression indication
>> > + * @input: pointer to the input data buffer
>> > + * @input_len: the length of the input data
>> > + * @output: pointer to the output data buffer
>> > + * @output_len: the size of the output data buffer
>> > + */
>> > +static void multifd_qpl_prepare_job(qpl_job *job, bool is_compression,
>> > +uint8_t *input, uint32_t input_len,
>> > +uint8_t *output, uint32_t
>> output_len)
>> > +{
>> > +job->op = is_compression ? qpl_op_compress : qpl_op_decompress;
>> > +job->next_in_ptr = input;
>> > +job->next_out_ptr = output;
>> > +job->available_in = input_len;
>> > +job->available_out = output_len;
>> > +job->flags = QPL_FLAG_FIRST | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;
>> > +/* only supports one compression level */
>> > +job->level = 1;
>> > +}
>> > +
>> > +/**
>> > + * multifd_qpl_build_packet: build a qpl compressed data packet
>> > + *
>> > + * The qpl compressed data packet consists of two parts, one part
>> stores
>> > + * the compressed length of each page, and the other part is the
>> compressed
>> > + * data of each page. The zbuf_hdr stores the compressed length of all
>> pages,
>> > + * and use a separate IOV to store the compressed data of each page.
>> > + *
>> > + * @qpl: pointer to the QplData structure
>> > + * @p: Params for the channel that we are using
>> > + * @idx: The index of the compressed length array
>> > + * @addr: pointer to the compressed data
>> > + * @len: The length of the compressed data
>> > + */
>> > +static void multifd_qpl_build_packet(QplData *qpl, MultiFDSendParams
>> *p,
>> > + uint32_t idx, uint8_t *addr,
>> uint32_t len)
>> > +{
>> > +qpl->zbuf_hdr[idx] = cpu_to_be32(len);
>> > +p->iov[p->iovs_num].iov_base = addr;
>> > +p->iov[p->iovs_num].iov_len = len;
>> > +p->iovs_num++;
>> > +p->next_packet_size += len;
>> > +}
>> > +
>> > +/**
>> > + * multifd_qpl_compress_pages: compress normal pages
>> > + *
>> > + * Each normal page will be compressed independently, and the
>> compression jobs
>>

Re: [PATCH v2 2/4] migration: fix a typo

2024-05-13 Thread Fabiano Rosas
marcandre.lur...@redhat.com writes:

> From: Marc-André Lureau 
>
> Signed-off-by: Marc-André Lureau 
> ---
>  migration/vmstate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index b51212a75b..ff5d589a6d 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -479,7 +479,7 @@ static int vmstate_subsection_load(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  
>  len = qemu_peek_byte(f, 1);
>  if (len < strlen(vmsd->name) + 1) {
> -/* subsection name has be be "section_name/a" */
> +/* subsection name has to be "section_name/a" */
>  trace_vmstate_subsection_load_bad(vmsd->name, "(short)", "");
>  return 0;
>  }

Reviewed-by: Fabiano Rosas 



Re: [PATCH v5 03/10] vfio: Extend migration_file_set_error() with Error** argument

2024-05-13 Thread Fabiano Rosas
Cédric Le Goater  writes:

> Use it to update the current error of the migration stream if
> available and if not, simply print out the error. Next changes will
> update with an error to report.
>
> Signed-off-by: Cédric Le Goater 

Acked-by: Fabiano Rosas 



Re: [PATCH V1 26/26] migration: only-migratable-modes

2024-05-13 Thread Fabiano Rosas
Steven Sistare  writes:

> On 5/9/2024 3:14 PM, Fabiano Rosas wrote:
>> Steve Sistare  writes:
>> 
>>> Add the only-migratable-modes option as a generalization of only-migratable.
>>> Only devices that support all requested modes are allowed.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>   include/migration/misc.h   |  3 +++
>>>   include/sysemu/sysemu.h|  1 -
>>>   migration/migration-hmp-cmds.c | 26 +-
>>>   migration/migration.c  | 22 +-
>>>   migration/savevm.c |  2 +-
>>>   qemu-options.hx| 16 ++--
>>>   system/globals.c   |  1 -
>>>   system/vl.c| 13 -
>>>   target/s390x/cpu_models.c  |  4 +++-
>>>   9 files changed, 75 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>> index 5b963ba..3ad2cd9 100644
>>> --- a/include/migration/misc.h
>>> +++ b/include/migration/misc.h
>>> @@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void);
>>>   /* True if background snapshot is active */
>>>   bool migration_in_bg_snapshot(void);
>>>   
>>> +void migration_set_required_mode(MigMode mode);
>>> +bool migration_mode_required(MigMode mode);
>>> +
>>>   /* migration/block-dirty-bitmap.c */
>>>   void dirty_bitmap_mig_init(void);
>>>   
>>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>>> index 5b4397e..0a9c4b4 100644
>>> --- a/include/sysemu/sysemu.h
>>> +++ b/include/sysemu/sysemu.h
>>> @@ -8,7 +8,6 @@
>>>   
>>>   /* vl.c */
>>>   
>>> -extern int only_migratable;
>>>   extern const char *qemu_name;
>>>   extern QemuUUID qemu_uuid;
>>>   extern bool qemu_uuid_set;
>>> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>>> index 414c7e8..ca913b7 100644
>>> --- a/migration/migration-hmp-cmds.c
>>> +++ b/migration/migration-hmp-cmds.c
>>> @@ -16,6 +16,7 @@
>>>   #include "qemu/osdep.h"
>>>   #include "block/qapi.h"
>>>   #include "migration/snapshot.h"
>>> +#include "migration/misc.h"
>>>   #include "monitor/hmp.h"
>>>   #include "monitor/monitor.h"
>>>   #include "qapi/error.h"
>>> @@ -33,6 +34,28 @@
>>>   #include "options.h"
>>>   #include "migration.h"
>>>   
>>> +static void migration_dump_modes(Monitor *mon)
>>> +{
>>> +int mode, n = 0;
>>> +
>>> +monitor_printf(mon, "only-migratable-modes: ");
>>> +
>>> +for (mode = 0; mode < MIG_MODE__MAX; mode++) {
>>> +if (migration_mode_required(mode)) {
>>> +if (n++) {
>>> +monitor_printf(mon, ",");
>>> +}
>>> +monitor_printf(mon, "%s", MigMode_str(mode));
>>> +}
>>> +}
>>> +
>>> +if (!n) {
>>> +monitor_printf(mon, "none\n");
>>> +} else {
>>> +monitor_printf(mon, "\n");
>>> +}
>>> +}
>>> +
>>>   static void migration_global_dump(Monitor *mon)
>>>   {
>>>   MigrationState *ms = migrate_get_current();
>>> @@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon)
>>>   monitor_printf(mon, "store-global-state: %s\n",
>>>  ms->store_global_state ? "on" : "off");
>>>   monitor_printf(mon, "only-migratable: %s\n",
>>> -   only_migratable ? "on" : "off");
>>> +   migration_mode_required(MIG_MODE_NORMAL) ? "on" : 
>>> "off");
>>>   monitor_printf(mon, "send-configuration: %s\n",
>>>  ms->send_configuration ? "on" : "off");
>>>   monitor_printf(mon, "send-section-footer: %s\n",
>>> @@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon)
>>>  ms->decompress_error_check ? "on" : "off");
>>>   monitor_printf(mon, "clear-bitmap-shift: %u\n",
>>>  ms->clea

Re: [PATCH V1 06/26] migration: precreate vmstate for exec

2024-05-13 Thread Fabiano Rosas
Steven Sistare  writes:

> On 5/6/2024 7:34 PM, Fabiano Rosas wrote:
>> Steve Sistare  writes:
>> 
>>> Provide migration_precreate_save for saving precreate vmstate across exec.
>>> Create a memfd, save its value in the environment, and serialize state
>>> to it.  Reverse the process in migration_precreate_load.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>   include/migration/misc.h |   5 ++
>>>   migration/meson.build|   1 +
>>>   migration/precreate.c| 139 
>>> +++
>>>   3 files changed, 145 insertions(+)
>>>   create mode 100644 migration/precreate.c
>>>
>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>> index c9e200f..cf30351 100644
>>> --- a/include/migration/misc.h
>>> +++ b/include/migration/misc.h
>>> @@ -56,6 +56,11 @@ AnnounceParameters *migrate_announce_params(void);
>>>   
>>>   void dump_vmstate_json_to_file(FILE *out_fp);
>>>   
>>> +/* migration/precreate.c */
>>> +int migration_precreate_save(Error **errp);
>>> +void migration_precreate_unsave(void);
>>> +int migration_precreate_load(Error **errp);
>>> +
>>>   /* migration/migration.c */
>>>   void migration_object_init(void);
>>>   void migration_shutdown(void);
>>> diff --git a/migration/meson.build b/migration/meson.build
>>> index f76b1ba..50e7cb2 100644
>>> --- a/migration/meson.build
>>> +++ b/migration/meson.build
>>> @@ -26,6 +26,7 @@ system_ss.add(files(
>>> 'ram-compress.c',
>>> 'options.c',
>>> 'postcopy-ram.c',
>>> +  'precreate.c',
>>> 'savevm.c',
>>> 'socket.c',
>>> 'tls.c',
>>> diff --git a/migration/precreate.c b/migration/precreate.c
>>> new file mode 100644
>>> index 000..0bf5e1f
>>> --- /dev/null
>>> +++ b/migration/precreate.c
>>> @@ -0,0 +1,139 @@
>>> +/*
>>> + * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or 
>>> later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/cutils.h"
>>> +#include "qemu/memfd.h"
>>> +#include "qapi/error.h"
>>> +#include "io/channel-file.h"
>>> +#include "migration/misc.h"
>>> +#include "migration/qemu-file.h"
>>> +#include "migration/savevm.h"
>>> +
>>> +#define PRECREATE_STATE_NAME "QEMU_PRECREATE_STATE"
>>> +
>>> +static QEMUFile *qemu_file_new_fd_input(int fd, const char *name)
>>> +{
>>> +g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
>>> +QIOChannel *ioc = QIO_CHANNEL(fioc);
>>> +qio_channel_set_name(ioc, name);
>>> +return qemu_file_new_input(ioc);
>>> +}
>>> +
>>> +static QEMUFile *qemu_file_new_fd_output(int fd, const char *name)
>>> +{
>>> +g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
>>> +QIOChannel *ioc = QIO_CHANNEL(fioc);
>>> +qio_channel_set_name(ioc, name);
>>> +return qemu_file_new_output(ioc);
>>> +}
>>> +
>>> +static int memfd_create_named(const char *name, Error **errp)
>>> +{
>>> +int mfd;
>>> +char val[16];
>>> +
>>> +mfd = memfd_create(name, 0);
>>> +if (mfd < 0) {
>>> +error_setg_errno(errp, errno, "memfd_create failed");
>>> +return -1;
>>> +}
>>> +
>>> +/* Remember mfd in environment for post-exec load */
>>> +qemu_clear_cloexec(mfd);
>>> +snprintf(val, sizeof(val), "%d", mfd);
>>> +g_setenv(name, val, 1);
>>> +
>>> +return mfd;
>>> +}
>>> +
>>> +static int memfd_find_named(const char *name, int *mfd_p, Error **errp)
>>> +{
>>> +const char *val = g_getenv(name);
>>> +
>>> +if (!val) {
>>> +*mfd_p = -1;
>>> +return 0;   /* No memfd was created, not an error */
>>> +}
>>> +g_unsetenv(name);
>>> +if (qemu_strtoi(val, NULL, 10, mfd_p)) {
>>> +error_setg(errp, "Bad %s env value %s", PRECREATE_STATE_

Re: [PATCH V1 13/26] physmem: ram_block_create

2024-05-13 Thread Fabiano Rosas
Steve Sistare  writes:

> Create a common subroutine to allocate a RAMBlock, de-duping the code to
> populate its common fields.  Add a trace point for good measure.
> No functional change.
>
> Signed-off-by: Steve Sistare 
> ---
>  system/physmem.c| 47 ++-
>  system/trace-events |  3 +++
>  2 files changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/system/physmem.c b/system/physmem.c
> index c3d04ca..6216b14 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -52,6 +52,7 @@
>  #include "sysemu/hw_accel.h"
>  #include "sysemu/xen-mapcache.h"
>  #include "trace/trace-root.h"
> +#include "trace.h"
>  
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>  #include 
> @@ -1918,11 +1919,29 @@ out_free:
>  }
>  }
>  
> +static RAMBlock *ram_block_create(MemoryRegion *mr, ram_addr_t size,
> +  ram_addr_t max_size, uint32_t ram_flags)
> +{
> +RAMBlock *rb = g_malloc0(sizeof(*rb));
> +
> +rb->used_length = size;
> +rb->max_length = max_size;
> +rb->fd = -1;
> +rb->flags = ram_flags;
> +rb->page_size = qemu_real_host_page_size();
> +rb->mr = mr;
> +rb->guest_memfd = -1;
> +trace_ram_block_create(rb->idstr, rb->flags, rb->fd, rb->used_length,

There's no idstr at this point, is there? I think this needs to be
memory_region_name(mr).

> +   rb->max_length, mr->align);
> +return rb;
> +}
> +
>  #ifdef CONFIG_POSIX
>  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>   uint32_t ram_flags, int fd, off_t offset,
>   Error **errp)
>  {
> +void *host;
>  RAMBlock *new_block;
>  Error *local_err = NULL;
>  int64_t file_size, file_align;
> @@ -1962,19 +1981,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
> MemoryRegion *mr,
>  return NULL;
>  }
>  
> -new_block = g_malloc0(sizeof(*new_block));
> -new_block->mr = mr;
> -new_block->used_length = size;
> -new_block->max_length = size;
> -new_block->flags = ram_flags;
> -new_block->guest_memfd = -1;
> -new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
> - errp);
> -if (!new_block->host) {
> +new_block = ram_block_create(mr, size, size, ram_flags);
> +host = file_ram_alloc(new_block, size, fd, !file_size, offset, errp);
> +if (!host) {
>  g_free(new_block);
>  return NULL;
>  }
>  
> +new_block->host = host;
>  ram_block_add(new_block, _err);
>  if (local_err) {
>  g_free(new_block);
> @@ -1982,7 +1996,6 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
> MemoryRegion *mr,
>  return NULL;
>  }
>  return new_block;
> -
>  }
>  
>  
> @@ -2054,18 +2067,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
> ram_addr_t max_size,
>  align = MAX(align, TARGET_PAGE_SIZE);
>  size = ROUND_UP(size, align);
>  max_size = ROUND_UP(max_size, align);
> -
> -new_block = g_malloc0(sizeof(*new_block));
> -new_block->mr = mr;
> -new_block->resized = resized;
> -new_block->used_length = size;
> -new_block->max_length = max_size;
>  assert(max_size >= size);
> -new_block->fd = -1;
> -new_block->guest_memfd = -1;
> -new_block->page_size = qemu_real_host_page_size();
> -new_block->host = host;
> -new_block->flags = ram_flags;
> +new_block = ram_block_create(mr, size, max_size, ram_flags);
> +new_block->resized = resized;
> +
>  ram_block_add(new_block, _err);
>  if (local_err) {
>  g_free(new_block);
> diff --git a/system/trace-events b/system/trace-events
> index 69c9044..f0a80ba 100644
> --- a/system/trace-events
> +++ b/system/trace-events
> @@ -38,3 +38,6 @@ dirtylimit_state_finalize(void)
>  dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) 
> "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us"
>  dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page 
> rate limit %"PRIu64
>  dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep 
> %"PRIi64 " us"
> +
> +# physmem.c
> +ram_block_create(const char *name, uint32_t flags, int fd, size_t 
> used_length, size_t max_length, size_t align) "%s, flags %u, fd %d, len %lu, 
> maxlen %lu, align %lu"



Re: [PATCH 5/6] migration: Rephrase message on failure to save / load Xen device state

2024-05-13 Thread Fabiano Rosas
Markus Armbruster  writes:

> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qmp_xen_save_devices_state() and qmp_xen_load_devices_state() violate
> this principle: they call qemu_save_device_state() and
> qemu_loadvm_state(), which call error_report_err().
>
> I wish I could clean this up now, but migration's error reporting is
> too complicated (confused?) for me to mess with it.
>
> Instead, I'm merely improving the error reported by
> qmp_xen_load_devices_state() and qmp_xen_load_devices_state() to the
> QMP core from
>
> An IO error has occurred
>
> to
> saving Xen device state failed
>
> and
>
> loading Xen device state failed
>
> respectively.
>
> Signed-off-by: Markus Armbruster 

Acked-by: Fabiano Rosas 



Re: [PATCH v6 6/7] migration/multifd: implement qpl compression and decompression

2024-05-13 Thread Fabiano Rosas
Yuan Liu  writes:

> each qpl job is used to (de)compress a normal page and it can
> be processed independently by the IAA hardware. All qpl jobs
> are submitted to the hardware at once, and wait for all jobs
> completion. If hardware path(IAA) is not available, use software
> for compression and decompression.
>
> Signed-off-by: Yuan Liu 
> Reviewed-by: Nanhai Zou 
> ---
>  migration/multifd-qpl.c | 284 +++-
>  1 file changed, 280 insertions(+), 4 deletions(-)
>
> diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
> index 89fa51091a..9a1fddbdd0 100644
> --- a/migration/multifd-qpl.c
> +++ b/migration/multifd-qpl.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/module.h"
>  #include "qapi/error.h"
> +#include "exec/ramblock.h"
>  #include "migration.h"
>  #include "multifd.h"
>  #include "qpl/qpl.h"
> @@ -204,6 +205,139 @@ static void multifd_qpl_send_cleanup(MultiFDSendParams 
> *p, Error **errp)
>  p->iov = NULL;
>  }
>  
> +/**
> + * multifd_qpl_prepare_job: prepare a compression or decompression job
> + *
> + * Prepare a compression or decompression job and configure job attributes
> + * including job compression level and flags.
> + *
> + * @job: pointer to the QplData structure

qpl_job structure

> + * @is_compression: compression or decompression indication
> + * @input: pointer to the input data buffer
> + * @input_len: the length of the input data
> + * @output: pointer to the output data buffer
> + * @output_len: the size of the output data buffer
> + */
> +static void multifd_qpl_prepare_job(qpl_job *job, bool is_compression,
> +uint8_t *input, uint32_t input_len,
> +uint8_t *output, uint32_t output_len)
> +{
> +job->op = is_compression ? qpl_op_compress : qpl_op_decompress;
> +job->next_in_ptr = input;
> +job->next_out_ptr = output;
> +job->available_in = input_len;
> +job->available_out = output_len;
> +job->flags = QPL_FLAG_FIRST | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;
> +/* only supports one compression level */
> +job->level = 1;
> +}
> +
> +/**
> + * multifd_qpl_build_packet: build a qpl compressed data packet
> + *
> + * The qpl compressed data packet consists of two parts, one part stores
> + * the compressed length of each page, and the other part is the compressed
> + * data of each page. The zbuf_hdr stores the compressed length of all pages,
> + * and use a separate IOV to store the compressed data of each page.
> + *
> + * @qpl: pointer to the QplData structure
> + * @p: Params for the channel that we are using
> + * @idx: The index of the compressed length array
> + * @addr: pointer to the compressed data
> + * @len: The length of the compressed data
> + */
> +static void multifd_qpl_build_packet(QplData *qpl, MultiFDSendParams *p,
> + uint32_t idx, uint8_t *addr, uint32_t 
> len)
> +{
> +qpl->zbuf_hdr[idx] = cpu_to_be32(len);
> +p->iov[p->iovs_num].iov_base = addr;
> +p->iov[p->iovs_num].iov_len = len;
> +p->iovs_num++;
> +p->next_packet_size += len;
> +}
> +
> +/**
> + * multifd_qpl_compress_pages: compress normal pages
> + *
> + * Each normal page will be compressed independently, and the compression 
> jobs
> + * will be submitted to the IAA hardware in non-blocking mode, waiting for 
> all
> + * jobs to be completed and filling the compressed length and data into the
> + * sending IOVs. If IAA device is not available, the software path is used.
> + *
> + * Returns 0 for success or -1 for error
> + *
> + * @p: Params for the channel that we are using
> + * @errp: pointer to an error
> + */
> +static int multifd_qpl_compress_pages(MultiFDSendParams *p, Error **errp)
> +{
> +qpl_status status;
> +QplData *qpl = p->compress_data;
> +MultiFDPages_t *pages = p->pages;
> +uint8_t *zbuf = qpl->zbuf;
> +uint8_t *host = pages->block->host;
> +uint32_t job_num = pages->normal_num;

A bit misleading because job_num is used in the previous patch as a
synonym for page_count. We could change the previous patch to:
multifd_qpl_init(uint32_t page_count, ...

> +qpl_job *job = NULL;
> +
> +assert(job_num <= qpl->total_job_num);
> +/* submit all compression jobs */
> +for (int i = 0; i < job_num; i++) {
> +job = qpl->job_array[i];
> +multifd_qpl_prepare_job(job, true, host + pages->offset[i],
> +p->page_size, zbuf, p->page_size - 1);

Isn't the output buffer size == page size, why the -1?

> +/* if hardware path(IAA) is unavailable, call the software path */

If we're doing the fallback automatically, isn't that what qpl_path_auto
does already? What's the difference betweeen the two approaches?

> +if (!qpl->iaa_avail) {

This function got a bit convoluted, it's probably worth a check at the
start and a branch to different multifd_qpl_compress_pages_slow()
routine 

Re: [PATCH v6 5/7] migration/multifd: implement initialization of qpl compression

2024-05-10 Thread Fabiano Rosas
Yuan Liu  writes:

> the qpl initialization includes memory allocation for compressed
> data and the qpl job initialization.
>
> the qpl job initialization will check if the In-Memory Analytics
> Accelerator(IAA) device is available and use the IAA device first.
> If the platform does not have IAA device or the IAA device is not
> available, the qpl compression will fallback to the software path.
>
> Signed-off-by: Yuan Liu 
> Reviewed-by: Nanhai Zou 
> ---
>  migration/multifd-qpl.c | 272 +++-
>  1 file changed, 271 insertions(+), 1 deletion(-)
>
> diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
> index 056a68a060..89fa51091a 100644
> --- a/migration/multifd-qpl.c
> +++ b/migration/multifd-qpl.c
> @@ -9,12 +9,282 @@
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> +
>  #include "qemu/osdep.h"
>  #include "qemu/module.h"
> +#include "qapi/error.h"
> +#include "migration.h"

Where is this used? I think you only need qapi/qapi-types-migration.h

> +#include "multifd.h"
> +#include "qpl/qpl.h"
> +
> +typedef struct {
> +qpl_job **job_array;
> +/* the number of allocated jobs */
> +uint32_t total_job_num;
> +/* compressed data buffer */
> +uint8_t *zbuf;
> +/* the length of compressed data */
> +uint32_t *zbuf_hdr;
> +/* the status of IAA device */
> +bool iaa_avail;
> +} QplData;
> +
> +/**
> + * check_iaa_avail: check if IAA device is available
> + *
> + * If the system does not have an IAA device, the IAA device is
> + * not enabled or the IAA work queue is not configured as a shared
> + * mode, the QPL hardware path initialization will fail.
> + *
> + * Returns true if IAA device is available, otherwise false.
> + */
> +static bool check_iaa_avail(void)
> +{
> +qpl_job *job = NULL;
> +uint32_t job_size = 0;
> +qpl_path_t path = qpl_path_hardware;
> +
> +if (qpl_get_job_size(path, _size) != QPL_STS_OK) {
> +return false;
> +}
> +job = g_malloc0(job_size);
> +if (qpl_init_job(path, job) != QPL_STS_OK) {
> +g_free(job);
> +return false;
> +}
> +g_free(job);
> +return true;
> +}
> +
> +/**
> + * multifd_qpl_free_jobs: cleanup jobs
> + *
> + * Free all job resources.
> + *
> + * @qpl: pointer to the QplData structure
> + */
> +static void multifd_qpl_free_jobs(QplData *qpl)
> +{
> +assert(qpl != NULL);
> +for (int i = 0; i < qpl->total_job_num; i++) {
> +qpl_fini_job(qpl->job_array[i]);
> +g_free(qpl->job_array[i]);
> +qpl->job_array[i] = NULL;
> +}
> +g_free(qpl->job_array);
> +qpl->job_array = NULL;
> +}
> +
> +/**
> + * multifd_qpl_init_jobs: initialize jobs
> + *
> + * Initialize all jobs
> + *
> + * @qpl: pointer to the QplData structure
> + * @chan_id: multifd channel number
> + * @errp: pointer to an error
> + */
> +static int multifd_qpl_init_jobs(QplData *qpl, uint8_t chan_id, Error **errp)
> +{
> +qpl_path_t path;
> +qpl_status status;
> +uint32_t job_size = 0;
> +qpl_job *job = NULL;
> +
> +path = qpl->iaa_avail ? qpl_path_hardware : qpl_path_software;
> +status = qpl_get_job_size(path, _size);
> +if (status != QPL_STS_OK) {
> +error_setg(errp, "multifd: %u: qpl_get_job_size failed with error 
> %d",
> +   chan_id, status);
> +return -1;
> +}
> +qpl->job_array = g_new0(qpl_job *, qpl->total_job_num);
> +for (int i = 0; i < qpl->total_job_num; i++) {
> +job = g_malloc0(job_size);
> +status = qpl_init_job(path, job);
> +if (status != QPL_STS_OK) {
> +error_setg(errp, "multifd: %u: qpl_init_job failed with error 
> %d",
> +   chan_id, status);
> +multifd_qpl_free_jobs(qpl);
> +return -1;
> +}
> +qpl->job_array[i] = job;
> +}
> +return 0;
> +}
> +
> +/**
> + * multifd_qpl_init: initialize QplData structure
> + *
> + * Allocate and initialize a QplData structure
> + *
> + * Returns QplData pointer for success or NULL for error
> + *
> + * @job_num: pointer to the QplData structure
> + * @job_size: the buffer size of the job
> + * @chan_id: multifd channel number
> + * @errp: pointer to an error
> + */
> +static QplData *multifd_qpl_init(uint32_t job_num, uint32_t job_size,
> + uint8_t chan_id, Error **errp)
> +{
> +QplData *qpl;
> +
> +qpl = g_new0(QplData, 1);
> +qpl->total_job_num = job_num;
> +qpl->iaa_avail = check_iaa_avail();
> +if (multifd_qpl_init_jobs(qpl, chan_id, errp) != 0) {
> +g_free(qpl);
> +return NULL;
> +}
> +qpl->zbuf = g_malloc0(job_size * job_num);
> +qpl->zbuf_hdr = g_new0(uint32_t, job_num);
> +return qpl;
> +}
> +
> +/**
> + * multifd_qpl_deinit: cleanup QplData structure
> + *
> + * Free jobs, comprssed buffers and QplData structure
> + *
> 

Re: [PATCH v6 5/7] migration/multifd: implement initialization of qpl compression

2024-05-10 Thread Fabiano Rosas
Yuan Liu  writes:

> the qpl initialization includes memory allocation for compressed
> data and the qpl job initialization.
>
> the qpl job initialization will check if the In-Memory Analytics
> Accelerator(IAA) device is available and use the IAA device first.
> If the platform does not have IAA device or the IAA device is not
> available, the qpl compression will fallback to the software path.
>
> Signed-off-by: Yuan Liu 
> Reviewed-by: Nanhai Zou 

Looks good, just some nits below.

> ---
>  migration/multifd-qpl.c | 272 +++-
>  1 file changed, 271 insertions(+), 1 deletion(-)
>
> diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
> index 056a68a060..89fa51091a 100644
> --- a/migration/multifd-qpl.c
> +++ b/migration/multifd-qpl.c
> @@ -9,12 +9,282 @@
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> +
>  #include "qemu/osdep.h"
>  #include "qemu/module.h"
> +#include "qapi/error.h"
> +#include "migration.h"
> +#include "multifd.h"
> +#include "qpl/qpl.h"
> +
> +typedef struct {
> +qpl_job **job_array;
> +/* the number of allocated jobs */
> +uint32_t total_job_num;
> +/* compressed data buffer */
> +uint8_t *zbuf;
> +/* the length of compressed data */

array of lenghts

> +uint32_t *zbuf_hdr;

Why the _hdr suffix if the lengths are the only data stored here?

> +/* the status of IAA device */
> +bool iaa_avail;
> +} QplData;
> +
> +/**
> + * check_iaa_avail: check if IAA device is available
> + *
> + * If the system does not have an IAA device, the IAA device is
> + * not enabled or the IAA work queue is not configured as a shared
> + * mode, the QPL hardware path initialization will fail.
> + *
> + * Returns true if IAA device is available, otherwise false.
> + */
> +static bool check_iaa_avail(void)
> +{
> +qpl_job *job = NULL;
> +uint32_t job_size = 0;
> +qpl_path_t path = qpl_path_hardware;
> +
> +if (qpl_get_job_size(path, _size) != QPL_STS_OK) {
> +return false;
> +}
> +job = g_malloc0(job_size);
> +if (qpl_init_job(path, job) != QPL_STS_OK) {
> +g_free(job);
> +return false;
> +}
> +g_free(job);
> +return true;
> +}
> +
> +/**
> + * multifd_qpl_free_jobs: cleanup jobs
> + *
> + * Free all job resources.
> + *
> + * @qpl: pointer to the QplData structure
> + */
> +static void multifd_qpl_free_jobs(QplData *qpl)
> +{
> +assert(qpl != NULL);
> +for (int i = 0; i < qpl->total_job_num; i++) {
> +qpl_fini_job(qpl->job_array[i]);
> +g_free(qpl->job_array[i]);
> +qpl->job_array[i] = NULL;
> +}
> +g_free(qpl->job_array);
> +qpl->job_array = NULL;
> +}
> +
> +/**
> + * multifd_qpl_init_jobs: initialize jobs
> + *
> + * Initialize all jobs
> + *
> + * @qpl: pointer to the QplData structure
> + * @chan_id: multifd channel number
> + * @errp: pointer to an error
> + */
> +static int multifd_qpl_init_jobs(QplData *qpl, uint8_t chan_id, Error **errp)
> +{
> +qpl_path_t path;
> +qpl_status status;
> +uint32_t job_size = 0;
> +qpl_job *job = NULL;
> +
> +path = qpl->iaa_avail ? qpl_path_hardware : qpl_path_software;
> +status = qpl_get_job_size(path, _size);
> +if (status != QPL_STS_OK) {
> +error_setg(errp, "multifd: %u: qpl_get_job_size failed with error 
> %d",
> +   chan_id, status);
> +return -1;
> +}
> +qpl->job_array = g_new0(qpl_job *, qpl->total_job_num);
> +for (int i = 0; i < qpl->total_job_num; i++) {
> +job = g_malloc0(job_size);
> +status = qpl_init_job(path, job);
> +if (status != QPL_STS_OK) {
> +error_setg(errp, "multifd: %u: qpl_init_job failed with error 
> %d",
> +   chan_id, status);
> +multifd_qpl_free_jobs(qpl);
> +return -1;
> +}
> +qpl->job_array[i] = job;
> +}
> +return 0;
> +}
> +
> +/**
> + * multifd_qpl_init: initialize QplData structure
> + *
> + * Allocate and initialize a QplData structure
> + *
> + * Returns QplData pointer for success or NULL for error
> + *
> + * @job_num: pointer to the QplData structure

This needs updating^

> + * @job_size: the buffer size of the job
> + * @chan_id: multifd channel number
> + * @errp: pointer to an error
> + */
> +static QplData *multifd_qpl_init(uint32_t job_num, uint32_t job_size,
> + uint8_t chan_id, Error **errp)
> +{
> +QplData *qpl;
> +
> +qpl = g_new0(QplData, 1);
> +qpl->total_job_num = job_num;
> +qpl->iaa_avail = check_iaa_avail();
> +if (multifd_qpl_init_jobs(qpl, chan_id, errp) != 0) {
> +g_free(qpl);
> +return NULL;
> +}
> +qpl->zbuf = g_malloc0(job_size * job_num);
> +qpl->zbuf_hdr = g_new0(uint32_t, job_num);
> +return qpl;
> +}
> +
> +/**
> + * multifd_qpl_deinit: cleanup QplData 

Re: [PATCH v6 2/7] migration/multifd: put IOV initialization into compression method

2024-05-10 Thread Fabiano Rosas
Yuan Liu  writes:

> Different compression methods may require different numbers of IOVs.
> Based on streaming compression of zlib and zstd, all pages will be
> compressed to a data block, so two IOVs are needed for packet header
> and compressed data block.
>
> Signed-off-by: Yuan Liu 
> Reviewed-by: Nanhai Zou 

Reviewed-by: Fabiano Rosas 



Re: [PATCH v6 4/7] migration/multifd: add qpl compression method

2024-05-10 Thread Fabiano Rosas
Yuan Liu  writes:

> add the Query Processing Library (QPL) compression method
>
> Introduce the qpl as a new multifd migration compression method, it can
> use In-Memory Analytics Accelerator(IAA) to accelerate compression and
> decompression, which can not only reduce network bandwidth requirement
> but also reduce host compression and decompression CPU overhead.
>
> How to enable qpl compression during migration:
> migrate_set_parameter multifd-compression qpl
>
> The qpl method only supports one compression level, there is no qpl
> compression level parameter added, users do not need to specify the
> qpl compression level.
>
> Signed-off-by: Yuan Liu 
> Reviewed-by: Nanhai Zou 

There's an r-b from Peter that you forgot to bring along in this version
of the series.




Re: [PATCH V1 26/26] migration: only-migratable-modes

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Add the only-migratable-modes option as a generalization of only-migratable.
> Only devices that support all requested modes are allowed.
>
> Signed-off-by: Steve Sistare 
> ---
>  include/migration/misc.h   |  3 +++
>  include/sysemu/sysemu.h|  1 -
>  migration/migration-hmp-cmds.c | 26 +-
>  migration/migration.c  | 22 +-
>  migration/savevm.c |  2 +-
>  qemu-options.hx| 16 ++--
>  system/globals.c   |  1 -
>  system/vl.c| 13 -
>  target/s390x/cpu_models.c  |  4 +++-
>  9 files changed, 75 insertions(+), 13 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 5b963ba..3ad2cd9 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void);
>  /* True if background snapshot is active */
>  bool migration_in_bg_snapshot(void);
>  
> +void migration_set_required_mode(MigMode mode);
> +bool migration_mode_required(MigMode mode);
> +
>  /* migration/block-dirty-bitmap.c */
>  void dirty_bitmap_mig_init(void);
>  
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 5b4397e..0a9c4b4 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -8,7 +8,6 @@
>  
>  /* vl.c */
>  
> -extern int only_migratable;
>  extern const char *qemu_name;
>  extern QemuUUID qemu_uuid;
>  extern bool qemu_uuid_set;
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 414c7e8..ca913b7 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -16,6 +16,7 @@
>  #include "qemu/osdep.h"
>  #include "block/qapi.h"
>  #include "migration/snapshot.h"
> +#include "migration/misc.h"
>  #include "monitor/hmp.h"
>  #include "monitor/monitor.h"
>  #include "qapi/error.h"
> @@ -33,6 +34,28 @@
>  #include "options.h"
>  #include "migration.h"
>  
> +static void migration_dump_modes(Monitor *mon)
> +{
> +int mode, n = 0;
> +
> +monitor_printf(mon, "only-migratable-modes: ");
> +
> +for (mode = 0; mode < MIG_MODE__MAX; mode++) {
> +if (migration_mode_required(mode)) {
> +if (n++) {
> +monitor_printf(mon, ",");
> +}
> +monitor_printf(mon, "%s", MigMode_str(mode));
> +}
> +}
> +
> +if (!n) {
> +monitor_printf(mon, "none\n");
> +} else {
> +monitor_printf(mon, "\n");
> +}
> +}
> +
>  static void migration_global_dump(Monitor *mon)
>  {
>  MigrationState *ms = migrate_get_current();
> @@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon)
>  monitor_printf(mon, "store-global-state: %s\n",
> ms->store_global_state ? "on" : "off");
>  monitor_printf(mon, "only-migratable: %s\n",
> -   only_migratable ? "on" : "off");
> +   migration_mode_required(MIG_MODE_NORMAL) ? "on" : "off");
>  monitor_printf(mon, "send-configuration: %s\n",
> ms->send_configuration ? "on" : "off");
>  monitor_printf(mon, "send-section-footer: %s\n",
> @@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon)
> ms->decompress_error_check ? "on" : "off");
>  monitor_printf(mon, "clear-bitmap-shift: %u\n",
> ms->clear_bitmap_shift);
> +migration_dump_modes(mon);
>  }
>  
>  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
> diff --git a/migration/migration.c b/migration/migration.c
> index 4984dee..5535b84 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1719,17 +1719,29 @@ static bool is_busy(Error **reasonp, Error **errp)
>  return false;
>  }
>  
> -static bool is_only_migratable(Error **reasonp, Error **errp, int modes)
> +static int migration_modes_required;
> +
> +void migration_set_required_mode(MigMode mode)
> +{
> +migration_modes_required |= BIT(mode);
> +}
> +
> +bool migration_mode_required(MigMode mode)
> +{
> +return !!(migration_modes_required & BIT(mode));
> +}
> +
> +static bool modes_are_required(Error **reasonp, Error **errp, int modes)
>  {
>  ERRP_GUARD();
>  
> -if (only_migratable && (modes & BIT(MIG_MODE_NORMAL))) {
> +if (migration_modes_required & modes) {
>  error_propagate_prepend(errp, *reasonp,
> -"disallowing migration blocker "
> -"(--only-migratable) for: ");
> +"-only-migratable{-modes}  specified, but: 
> ");

extra space before 'specified'

>  *reasonp = NULL;
>  return true;
>  }
> +
>  return false;
>  }
>  
> @@ -1783,7 +1795,7 @@ int migrate_add_blocker_modes(Error **reasonp, Error 
> **errp, MigMode mode, ...)
>  modes = get_modes(mode, ap);
>  va_end(ap);
>  
> -if (is_only_migratable(reasonp, 

Re: [PATCH V1 25/26] migration: fix mismatched GPAs during cpr-exec

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> For cpr-exec mode, ramblock_is_ignored is always true, and the address of
> each migrated memory region must match the address of the statically
> initialized region on the target.  However, for a PCI rom block, the region
> address is set when the guest writes to a BAR on the source, which does not
> occur on the target, causing a "Mismatched GPAs" error during cpr-exec
> migration.
>
> To fix, unconditionally set the target's address to the source's address
> if the region does not have an address yet.
>
> Signed-off-by: Steve Sistare 

Just a detail below.

Reviewed-by: Fabiano Rosas 

> ---
>  include/exec/memory.h | 12 
>  migration/ram.c   | 15 +--
>  system/memory.c   | 10 --
>  3 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index d337737..4f654b0 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -801,6 +801,7 @@ struct MemoryRegion {
>  bool unmergeable;
>  uint8_t dirty_log_mask;
>  bool is_iommu;
> +bool has_addr;

This field is not used during memory access, maybe move it down below to
preserve the hole for future usage.

>  RAMBlock *ram_block;
>  Object *owner;
>  /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access 
> hotpath */
> @@ -2402,6 +2403,17 @@ void memory_region_set_enabled(MemoryRegion *mr, bool 
> enabled);
>  void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
>  
>  /*
> + * memory_region_set_address_only: set the address of a region.
> + *
> + * Same as memory_region_set_address, but without causing transaction side
> + * effects.
> + *
> + * @mr: the region to be updated
> + * @addr: new address, relative to container region
> + */
> +void memory_region_set_address_only(MemoryRegion *mr, hwaddr addr);
> +
> +/*
>   * memory_region_set_size: dynamically update the size of a region.
>   *
>   * Dynamically updates the size of a region.
> diff --git a/migration/ram.c b/migration/ram.c
> index add285b..7b8d7f6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4196,12 +4196,15 @@ static int parse_ramblock(QEMUFile *f, RAMBlock 
> *block, ram_addr_t length)
>  }
>  if (migrate_ignore_shared()) {
>  hwaddr addr = qemu_get_be64(f);
> -if (migrate_ram_is_ignored(block) &&
> -block->mr->addr != addr) {
> -error_report("Mismatched GPAs for block %s "
> - "%" PRId64 "!= %" PRId64, block->idstr,
> - (uint64_t)addr, (uint64_t)block->mr->addr);
> -return -EINVAL;
> +if (migrate_ram_is_ignored(block)) {
> +if (!block->mr->has_addr) {
> +memory_region_set_address_only(block->mr, addr);
> +} else if (block->mr->addr != addr) {
> +error_report("Mismatched GPAs for block %s "
> + "%" PRId64 "!= %" PRId64, block->idstr,
> + (uint64_t)addr, (uint64_t)block->mr->addr);
> +return -EINVAL;
> +}
>  }
>  }
>  ret = rdma_block_notification_handle(f, block->idstr);
> diff --git a/system/memory.c b/system/memory.c
> index ca04a0e..3c72504 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2665,7 +2665,7 @@ static void 
> memory_region_add_subregion_common(MemoryRegion *mr,
>  for (alias = subregion->alias; alias; alias = alias->alias) {
>  alias->mapped_via_alias++;
>  }
> -subregion->addr = offset;
> +memory_region_set_address_only(subregion, offset);
>  memory_region_update_container_subregions(subregion);
>  }
>  
> @@ -2745,10 +2745,16 @@ static void 
> memory_region_readd_subregion(MemoryRegion *mr)
>  }
>  }
>  
> +void memory_region_set_address_only(MemoryRegion *mr, hwaddr addr)
> +{
> +mr->addr = addr;
> +mr->has_addr = true;
> +}
> +
>  void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
>  {
>  if (addr != mr->addr) {
> -mr->addr = addr;
> +memory_region_set_address_only(mr, addr);
>  memory_region_readd_subregion(mr);
>  }
>  }



Re: [PATCH V1 24/26] seccomp: cpr-exec blocker

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> cpr-exec mode needs permission to exec.  Block it if permission is denied.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH V1 23/26] migration: misc cpr-exec blockers

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Add blockers for cpr-exec migration mode for devices and options that do
> not support it.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH V1 22/26] migration: ram block cpr-exec blockers

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Unlike cpr-reboot mode, cpr-exec mode cannot save volatile ram blocks in the
> migration stream file and recreate them later, because the physical memory for
> the blocks is pinned and registered for vfio.  Add an exec-mode blocker for
> volatile ram blocks.
>
> Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd may be
> sufficient for cpr-exec, but it has not been tested yet.
>
> - Steve

extra text here

>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 




Re: [PATCH V1 21/26] migration: migrate_add_blocker_mode

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Define a convenience function to add a migration blocker for a single mode.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH V1 09/26] migration: vmstate_register_named

2024-05-09 Thread Fabiano Rosas
Fabiano Rosas  writes:

> Steve Sistare  writes:
>
>> Define vmstate_register_named which takes the instance name as its first
>> parameter, instead of generating the name from VMStateIf of the Object.
>> This will be needed to register objects that are not Objects.  Pass the
>> new name parameter to vmstate_register_with_alias_id.
>>
>> Signed-off-by: Steve Sistare 
>
> Reviewed-by: Fabiano Rosas 

Actually, can't we define a wrapper type just for this purpose? For
example, looking at dbus-vmstate.c:

static void dbus_vmstate_class_init(ObjectClass *oc, void *data)
{
...
VMStateIfClass *vc = VMSTATE_IF_CLASS(oc);

vc->get_id = dbus_vmstate_get_id;
...
}

static const TypeInfo dbus_vmstate_info = {
.name = TYPE_DBUS_VMSTATE,
.parent = TYPE_OBJECT,
.instance_size = sizeof(DBusVMState),
.instance_finalize = dbus_vmstate_finalize,
.class_init = dbus_vmstate_class_init,
.interfaces = (InterfaceInfo[]) {
{ TYPE_USER_CREATABLE },   // without this one
{ TYPE_VMSTATE_IF },
{ }
}
};

static void register_types(void)
{
type_register_static(_vmstate_info);
}
type_init(register_types);



Re: [PATCH V1 09/26] migration: vmstate_register_named

2024-05-09 Thread Fabiano Rosas
Steve Sistare  writes:

> Define vmstate_register_named which takes the instance name as its first
> parameter, instead of generating the name from VMStateIf of the Object.
> This will be needed to register objects that are not Objects.  Pass the
> new name parameter to vmstate_register_with_alias_id.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



[PULL v2 10/13] migration: Remove block migration

2024-05-08 Thread Fabiano Rosas
The block migration has been considered obsolete since QEMU 8.2 in
favor of the more flexible storage migration provided by the
blockdev-mirror driver. Two releases have passed so now it's time to
remove it.

Deprecation commit 66db46ca83 ("migration: Deprecate block
migration").

Reviewed-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 MAINTAINERS |1 -
 docs/about/deprecated.rst   |   10 -
 docs/about/removed-features.rst |   14 +
 docs/devel/migration/main.rst   |2 +-
 include/migration/misc.h|6 -
 meson.build |2 -
 meson_options.txt   |2 -
 migration/block.c   | 1018 ---
 migration/block.h   |   52 --
 migration/colo.c|1 -
 migration/meson.build   |3 -
 migration/migration-hmp-cmds.c  |   25 -
 migration/migration.c   |   15 -
 migration/options.c |   25 -
 migration/options.h |1 -
 migration/ram.c |   15 -
 migration/savevm.c  |5 -
 qapi/migration.json |   61 +-
 scripts/meson-buildoptions.sh   |4 -
 19 files changed, 26 insertions(+), 1236 deletions(-)
 delete mode 100644 migration/block.c
 delete mode 100644 migration/block.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 84391777db..e4990ea42e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2866,7 +2866,6 @@ F: util/aio-*.h
 F: util/defer-call.c
 F: util/fdmon-*.c
 F: block/io.c
-F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
 F: include/qemu/defer-call.h
diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 08d8ef37a7..3d324930f3 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -477,16 +477,6 @@ option).
 Migration
 -
 
-block migration (since 8.2)
-'''
-
-Block migration is too inflexible.  It needs to migrate all block
-devices or none.
-
-Please see "QMP invocation for live storage migration with
-``blockdev-mirror`` + NBD" in docs/interop/live-block-operations.rst
-for a detailed explanation.
-
 old compression method (since 8.2)
 ''
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index a491c0..9e7d8ee4ff 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -634,6 +634,13 @@ Use blockdev-mirror with NBD instead. See "QMP invocation 
for live
 storage migration with ``blockdev-mirror`` + NBD" in
 docs/interop/live-block-operations.rst for a detailed explanation.
 
+``migrate-set-capabilities`` ``block`` option (removed in 9.1)
+''
+
+Block migration has been removed. For a replacement, see "QMP
+invocation for live storage migration with ``blockdev-mirror`` + NBD"
+in docs/interop/live-block-operations.rst.
+
 Human Monitor Protocol (HMP) commands
 -
 
@@ -708,6 +715,13 @@ Use blockdev-mirror with NBD instead. See "QMP invocation 
for live
 storage migration with ``blockdev-mirror`` + NBD" in
 docs/interop/live-block-operations.rst for a detailed explanation.
 
+``migrate_set_capability`` ``block`` option (removed in 9.1)
+
+
+Block migration has been removed. For a replacement, see "QMP
+invocation for live storage migration with ``blockdev-mirror`` + NBD"
+in docs/interop/live-block-operations.rst.
+
 Host Architectures
 --
 
diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 54385a23e5..495cdcb112 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -454,7 +454,7 @@ Examples of such API functions are:
 Iterative device migration
 --
 
-Some devices, such as RAM, Block storage or certain platform devices,
+Some devices, such as RAM or certain platform devices,
 have large amounts of data that would mean that the CPUs would be
 paused for too long if they were sent in one section.  For these
 devices an *iterative* approach is taken.
diff --git a/include/migration/misc.h b/include/migration/misc.h
index c9e200f4eb..bf7339cc1e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -45,12 +45,6 @@ bool migrate_ram_is_ignored(RAMBlock *block);
 
 /* migration/block.c */
 
-#ifdef CONFIG_LIVE_BLOCK_MIGRATION
-void blk_mig_init(void);
-#else
-static inline void blk_mig_init(void) {}
-#endif
-
 AnnounceParameters *migrate_announce_params(void);
 /* migration/savevm.c */
 
diff --git a/meson.build b/meson.build
index 43da492372..83ae4347c7 100644
--- a/meson.build
+++ b/meson.build
@@ -2351,7 +2351,6 @@ config_host_data.set('CONFIG_DEBUG_MUTEX', 
get_option('debug_mutex'))
 config_host_data.set('CONFIG_DEBUG_STACK_USAGE', 
get_option('

[PULL v2 02/13] migration: move trace-point from migrate_fd_error to migrate_set_error

2024-05-08 Thread Fabiano Rosas
From: Vladimir Sementsov-Ogievskiy 

Cover more cases by trace-point.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c  | 4 +++-
 migration/trace-events | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..2dc6a063e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque)
 void migrate_set_error(MigrationState *s, const Error *error)
 {
 QEMU_LOCK_GUARD(>error_mutex);
+
+trace_migrate_error(error_get_pretty(error));
+
 if (!s->error) {
 s->error = error_copy(error);
 }
@@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s)
 
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
-trace_migrate_fd_error(error_get_pretty(error));
 assert(s->to_dst_file == NULL);
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
diff --git a/migration/trace-events b/migration/trace-events
index f0e1cb80c7..d0c44c3853 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char 
*ioctype, const char *hostnam
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
-migrate_fd_error(const char *error_desc) "error=%s"
+migrate_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in 
%s at 0x%zx len 0x%zx"
 migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact 
pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-- 
2.35.3




[PULL v2 05/13] migration: process_incoming_migration_co(): rework error reporting

2024-05-08 Thread Fabiano Rosas
From: Vladimir Sementsov-Ogievskiy 

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b307a4bc59..a9599838e6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -735,14 +735,16 @@ static void process_incoming_migration_bh(void *opaque)
 static void coroutine_fn
 process_incoming_migration_co(void *opaque)
 {
+MigrationState *s = migrate_get_current();
 MigrationIncomingState *mis = migration_incoming_get_current();
 PostcopyState ps;
 int ret;
+Error *local_err = NULL;
 
 assert(mis->from_src_file);
 
 if (compress_threads_load_setup(mis->from_src_file)) {
-error_report("Failed to setup decompress threads");
+error_setg(_err, "Failed to setup decompress threads");
 goto fail;
 }
 
@@ -779,19 +781,12 @@ process_incoming_migration_co(void *opaque)
 }
 
 if (ret < 0) {
-MigrationState *s = migrate_get_current();
-
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(>error_mutex) {
-error_report_err(s->error);
-s->error = NULL;
-}
-}
-error_report("load of migration failed: %s", strerror(-ret));
+error_setg(_err, "load of migration failed: %s", strerror(-ret));
 goto fail;
 }
 
 if (colo_incoming_co() < 0) {
+error_setg(_err, "colo incoming failed");
 goto fail;
 }
 
@@ -800,8 +795,16 @@ process_incoming_migration_co(void *opaque)
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
+migrate_set_error(s, local_err);
+error_free(local_err);
+
 migration_incoming_state_destroy();
 
+WITH_QEMU_LOCK_GUARD(>error_mutex) {
+error_report_err(s->error);
+s->error = NULL;
+}
+
 exit(EXIT_FAILURE);
 }
 
-- 
2.35.3




[PULL v2 07/13] migration: Remove 'skipped' field from MigrationStats

2024-05-08 Thread Fabiano Rosas
The 'skipped' field of the MigrationStats struct has been deprecated
in 8.1. Time to remove it.

Deprecation commit 7b24d32634 ("migration: skipped field is really
obsolete.").

Reviewed-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 docs/about/deprecated.rst   | 6 --
 docs/about/removed-features.rst | 6 ++
 migration/migration-hmp-cmds.c  | 2 --
 migration/migration.c   | 2 --
 qapi/migration.json | 8 
 5 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 03f8b1b655..94d3e53513 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -477,12 +477,6 @@ option).
 Migration
 -
 
-``skipped`` MigrationStats field (since 8.1)
-
-
-``skipped`` field in Migration stats has been deprecated.  It hasn't
-been used for more than 10 years.
-
 ``inc`` migrate command option (since 8.2)
 ''
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 53ca08aba9..c4cb2692d0 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -614,6 +614,12 @@ was superseded by ``sections``.
 Member ``section-size`` in the return value of ``query-sgx-capabilities``
 was superseded by ``sections``.
 
+``query-migrate`` return value member ``skipped`` (removed in 9.1)
+''
+
+Member ``skipped`` of the ``MigrationStats`` struct hasn't been used
+for more than 10 years. Removed with no replacement.
+
 Human Monitor Protocol (HMP) commands
 -
 
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 23181bbee1..b6b2035f64 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -105,8 +105,6 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->ram->total >> 10);
 monitor_printf(mon, "duplicate: %" PRIu64 " pages\n",
info->ram->duplicate);
-monitor_printf(mon, "skipped: %" PRIu64 " pages\n",
-   info->ram->skipped);
 monitor_printf(mon, "normal: %" PRIu64 " pages\n",
info->ram->normal);
 monitor_printf(mon, "normal bytes: %" PRIu64 " kbytes\n",
diff --git a/migration/migration.c b/migration/migration.c
index 289afa8d85..a4be929e40 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1156,8 +1156,6 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->transferred = migration_transferred_bytes();
 info->ram->total = ram_bytes_total();
 info->ram->duplicate = stat64_get(_stats.zero_pages);
-/* legacy value.  It is not used anymore */
-info->ram->skipped = 0;
 info->ram->normal = stat64_get(_stats.normal_pages);
 info->ram->normal_bytes = info->ram->normal * page_size;
 info->ram->mbps = s->mbps;
diff --git a/qapi/migration.json b/qapi/migration.json
index 9feed413b5..2dd70f1c0e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -23,9 +23,6 @@
 #
 # @duplicate: number of duplicate (zero) pages (since 1.2)
 #
-# @skipped: number of skipped zero pages.  Always zero, only provided
-# for compatibility (since 1.5)
-#
 # @normal: number of normal pages (since 1.2)
 #
 # @normal-bytes: number of normal bytes sent (since 1.2)
@@ -63,16 +60,11 @@
 # between 0 and @dirty-sync-count * @multifd-channels.  (since
 # 7.1)
 #
-# Features:
-#
-# @deprecated: Member @skipped is always zero since 1.5.3
-#
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
   'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
'duplicate': 'int',
-   'skipped': { 'type': 'int', 'features': [ 'deprecated' ] },
'normal': 'int',
'normal-bytes': 'int', 'dirty-pages-rate': 'int',
'mbps': 'number', 'dirty-sync-count': 'int',
-- 
2.35.3




[PULL v2 01/13] migration/ram.c: API Conversion qemu_mutex_lock(), and qemu_mutex_unlock() to WITH_QEMU_LOCK_GUARD macro

2024-05-08 Thread Fabiano Rosas
From: Will Gyda 

migration/ram.c: API Conversion qemu_mutex_lock(),
and qemu_mutex_unlock() to WITH_QEMU_LOCK_GUARD macro

Signed-off-by: Will Gyda 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/ram.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index a975c5af16..50df1e9cd2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1066,14 +1066,14 @@ static void migration_bitmap_sync(RAMState *rs, bool 
last_stage)
 trace_migration_bitmap_sync_start();
 memory_global_dirty_log_sync(last_stage);
 
-qemu_mutex_lock(>bitmap_mutex);
-WITH_RCU_READ_LOCK_GUARD() {
-RAMBLOCK_FOREACH_NOT_IGNORED(block) {
-ramblock_sync_dirty_bitmap(rs, block);
+WITH_QEMU_LOCK_GUARD(>bitmap_mutex) {
+WITH_RCU_READ_LOCK_GUARD() {
+RAMBLOCK_FOREACH_NOT_IGNORED(block) {
+ramblock_sync_dirty_bitmap(rs, block);
+}
+stat64_set(_stats.dirty_bytes_last_sync, 
ram_bytes_remaining());
 }
-stat64_set(_stats.dirty_bytes_last_sync, ram_bytes_remaining());
 }
-qemu_mutex_unlock(>bitmap_mutex);
 
 memory_global_after_dirty_log_sync();
 trace_migration_bitmap_sync_end(rs->num_dirty_pages_period);
-- 
2.35.3




[PULL v2 11/13] migration: Remove non-multifd compression

2024-05-08 Thread Fabiano Rosas
The 'compress' migration capability enables the old compression code
which has shown issues over the years and is thought to be less stable
and tested than the more recent multifd-based compression. The old
compression code has been deprecated in 8.2 and now is time to remove
it.

Deprecation commit 864128df46 ("migration: Deprecate old compression
method").

Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 docs/about/deprecated.rst   |  11 -
 docs/about/removed-features.rst |  55 
 hw/core/machine.c   |   1 -
 migration/meson.build   |   1 -
 migration/migration-hmp-cmds.c  |  47 ---
 migration/migration.c   |  13 -
 migration/migration.h   |   7 -
 migration/options.c | 164 --
 migration/options.h |   5 -
 migration/qemu-file.c   |  78 -
 migration/qemu-file.h   |   4 -
 migration/ram-compress.c| 564 
 migration/ram-compress.h|  77 -
 migration/ram.c | 154 +
 qapi/migration.json | 112 ---
 tests/qtest/migration-test.c| 139 
 16 files changed, 64 insertions(+), 1368 deletions(-)
 delete mode 100644 migration/ram-compress.c
 delete mode 100644 migration/ram-compress.h

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 3d324930f3..64b8f838be 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -473,14 +473,3 @@ both, older and future versions of QEMU.
 The ``blacklist`` config file option has been renamed to ``block-rpcs``
 (to be in sync with the renaming of the corresponding command line
 option).
-
-Migration
--
-
-old compression method (since 8.2)
-''
-
-Compression method fails too much.  Too many races.  We are going to
-remove it if nobody fixes it.  For starters, migration-test
-compression tests are disabled because they fail randomly.  If you need
-compression, use multifd compression methods.
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 9e7d8ee4ff..fba0cfb0b0 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -505,6 +505,11 @@ configurations (e.g. -smp 8,sockets=0) is removed since 
9.0, users have
 to ensure that all the topology members described with -smp are greater
 than zero.
 
+``-global migration.decompress-error-check`` (removed in 9.1)
+'
+
+Removed along with the ``compression`` migration capability.
+
 User-mode emulator command line arguments
 -
 
@@ -641,6 +646,31 @@ Block migration has been removed. For a replacement, see 
"QMP
 invocation for live storage migration with ``blockdev-mirror`` + NBD"
 in docs/interop/live-block-operations.rst.
 
+``migrate-set-parameter`` ``compress-level`` option (removed in 9.1)
+
+
+Use ``multifd-zlib-level`` or ``multifd-zstd-level`` instead.
+
+``migrate-set-parameter`` ``compress-threads`` option (removed in 9.1)
+''
+
+Use ``multifd-channels`` instead.
+
+``migrate-set-parameter`` ``compress-wait-thread`` option (removed in 9.1)
+''
+
+Removed with no replacement.
+
+``migrate-set-parameter`` ``decompress-threads`` option (removed in 9.1)
+
+
+Use ``multifd-channels`` instead.
+
+``migrate-set-capability`` ``compress`` option (removed in 9.1)
+'''
+
+Use ``multifd-compression`` instead.
+
 Human Monitor Protocol (HMP) commands
 -
 
@@ -722,6 +752,31 @@ Block migration has been removed. For a replacement, see 
"QMP
 invocation for live storage migration with ``blockdev-mirror`` + NBD"
 in docs/interop/live-block-operations.rst.
 
+``migrate_set_parameter`` ``compress-level`` option (removed in 9.1)
+
+
+Use ``multifd-zlib-level`` or ``multifd-zstd-level`` instead.
+
+``migrate_set_parameter`` ``compress-threads`` option (removed in 9.1)
+''
+
+Use ``multifd-channels`` instead.
+
+``migrate_set_parameter`` ``compress-wait-thread`` option (removed in 9.1)
+''
+
+Removed with no replacement.
+
+``migrate_set_parameter`` ``decompress-threads`` option (removed in 9.1)
+
+
+Use ``multifd-channels`` instead.
+
+``migrate_set_capability`` ``com

[PULL v2 12/13] migration: Deprecate fd: for file migration

2024-05-08 Thread Fabiano Rosas
The fd: URI can currently trigger two different types of migration, a
TCP migration using sockets and a file migration using a plain
file. This is in conflict with the recently introduced (8.2) QMP
migrate API that takes structured data as JSON-like format. We cannot
keep the same backend for both types of migration because with the new
API the code is more tightly coupled to the type of transport. This
means a TCP migration must use the 'socket' transport and a file
migration must use the 'file' transport.

If we keep allowing fd: when using a file, this creates an issue when
the user converts the old-style (fd:) to the new style ("transport":
"socket") invocation because the file descriptor in question has
previously been allowed to be either a plain file or a socket.

To avoid creating too much confusion, we can simply deprecate the fd:
+ file usage, which is thought to be rarely used currently and instead
establish a 1:1 correspondence between fd: URI and socket transport,
and file: URI and file transport.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 docs/about/deprecated.rst | 14 ++
 migration/fd.c| 12 
 2 files changed, 26 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 64b8f838be..f5f111495b 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -473,3 +473,17 @@ both, older and future versions of QEMU.
 The ``blacklist`` config file option has been renamed to ``block-rpcs``
 (to be in sync with the renaming of the corresponding command line
 option).
+
+Migration
+-
+
+``fd:`` URI when used for file migration (since 9.1)
+
+
+The ``fd:`` URI can currently provide a file descriptor that
+references either a socket or a plain file. These are two different
+types of migration. In order to reduce ambiguity, the ``fd:`` URI
+usage of providing a file descriptor to a plain file has been
+deprecated in favor of explicitly using the ``file:`` URI with the
+file descriptor being passed as an ``fdset``. Refer to the ``add-fd``
+command documentation for details on the ``fdset`` usage.
diff --git a/migration/fd.c b/migration/fd.c
index 449adaa2de..aab5189eac 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -20,6 +20,8 @@
 #include "file.h"
 #include "migration.h"
 #include "monitor/monitor.h"
+#include "qemu/error-report.h"
+#include "qemu/sockets.h"
 #include "io/channel-util.h"
 #include "trace.h"
 
@@ -32,6 +34,11 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 return;
 }
 
+if (!fd_is_socket(fd)) {
+warn_report("fd: migration to a file is deprecated."
+" Use file: instead.");
+}
+
 trace_migration_fd_outgoing(fd);
 ioc = qio_channel_new_fd(fd, errp);
 if (!ioc) {
@@ -61,6 +68,11 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
 return;
 }
 
+if (!fd_is_socket(fd)) {
+warn_report("fd: migration to a file is deprecated."
+" Use file: instead.");
+}
+
 trace_migration_fd_incoming(fd);
 
 ioc = qio_channel_new_fd(fd, errp);
-- 
2.35.3




[PULL v2 13/13] hmp/migration: Fix "migrate" command's documentation

2024-05-08 Thread Fabiano Rosas
From: Peter Xu 

Peter missed the Sphinx HMP document for the "resume/-r" flag in commit
7a4da28b26 ("qmp: hmp: add migrate "resume" option").  Add it.

When at it, slightly cleanup the lines around:

  - Move "detach/-d" to a separate section rather than appending it at the
  end of the command description. Add a hint for how to query the migration
  results in detached mode.

  - Add "postcopy" keyword to "resume/-r" help messages, as it only applies
  to postcopy.

Cc: Dr. David Alan Gilbert 
Cc: Fabiano Rosas 
Fixes: 7a4da28b26 ("qmp: hmp: add migrate "resume" option")
Reported-by: Markus Armbruster 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 hmp-commands.hx | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ebca2cdced..06746f0afc 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -912,14 +912,20 @@ ERST
 .args_type  = "detach:-d,resume:-r,uri:s",
 .params = "[-d] [-r] uri",
 .help   = "migrate to URI (using -d to not wait for completion)"
- "\n\t\t\t -r to resume a paused migration",
+ "\n\t\t\t -r to resume a paused postcopy migration",
 .cmd= hmp_migrate,
 },
 
 
 SRST
-``migrate [-d]`` *uri*
-  Migrate to *uri* (using -d to not wait for completion).
+``migrate [-d] [-r]`` *uri*
+  Migrate the VM to *uri*.
+
+  ``-d``
+Start the migration process, but do not wait for its completion.  To
+query an ongoing migration process, use "info migrate".
+  ``-r``
+Resume a paused postcopy migration.
 ERST
 
 {
-- 
2.35.3




[PULL v2 08/13] migration: Remove 'inc' option from migrate command

2024-05-08 Thread Fabiano Rosas
The block incremental option for block migration has been deprecated
in 8.2 in favor of using the block-mirror feature. Remove it now.

Deprecation commit 40101f320d ("migration: migrate 'inc' command
option is deprecated.").

Reviewed-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 docs/about/deprecated.rst   |  9 --
 docs/about/removed-features.rst | 14 +
 hmp-commands.hx | 13 +++-
 migration/block.c   |  1 -
 migration/migration-hmp-cmds.c  | 18 ++-
 migration/migration.c   | 24 +--
 migration/options.c | 30 +-
 migration/options.h |  5 ---
 qapi/migration.json | 54 +++--
 9 files changed, 39 insertions(+), 129 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 94d3e53513..1dfbd5fad4 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -477,15 +477,6 @@ option).
 Migration
 -
 
-``inc`` migrate command option (since 8.2)
-''
-
-Use blockdev-mirror with NBD instead.
-
-As an intermediate step the ``inc`` functionality can be achieved by
-setting the ``block-incremental`` migration parameter to ``true``.
-But this parameter is also deprecated.
-
 ``blk`` migrate command option (since 8.2)
 ''
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c4cb2692d0..7da4b3df14 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -620,6 +620,13 @@ was superseded by ``sections``.
 Member ``skipped`` of the ``MigrationStats`` struct hasn't been used
 for more than 10 years. Removed with no replacement.
 
+``migrate`` command option ``inc`` (removed in 9.1)
+'''
+
+Use blockdev-mirror with NBD instead. See "QMP invocation for live
+storage migration with ``blockdev-mirror`` + NBD" in
+docs/interop/live-block-operations.rst for a detailed explanation.
+
 Human Monitor Protocol (HMP) commands
 -
 
@@ -680,6 +687,13 @@ This command didn't produce any output already. Removed 
with no replacement.
 The ``singlestep`` command has been replaced by the ``one-insn-per-tb``
 command, which has the same behaviour but a less misleading name.
 
+``migrate`` command ``-i`` option (removed in 9.1)
+''
+
+Use blockdev-mirror with NBD instead. See "QMP invocation for live
+storage migration with ``blockdev-mirror`` + NBD" in
+docs/interop/live-block-operations.rst for a detailed explanation.
+
 Host Architectures
 --
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2e2a3bcf98..7978302949 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -909,26 +909,21 @@ ERST
 
 {
 .name   = "migrate",
-.args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
-.params = "[-d] [-b] [-i] [-r] uri",
+.args_type  = "detach:-d,blk:-b,resume:-r,uri:s",
+.params = "[-d] [-b] [-r] uri",
 .help   = "migrate to URI (using -d to not wait for completion)"
  "\n\t\t\t -b for migration without shared storage with"
- " full copy of disk\n\t\t\t -i for migration without "
- "shared storage with incremental copy of disk "
- "(base image shared between src and destination)"
-  "\n\t\t\t -r to resume a paused migration",
+ " full copy of disk\n\t\t\t -r to resume a paused 
migration",
 .cmd= hmp_migrate,
 },
 
 
 SRST
-``migrate [-d] [-b] [-i]`` *uri*
+``migrate [-d] [-b]`` *uri*
   Migrate to *uri* (using -d to not wait for completion).
 
   ``-b``
 for migration with full copy of disk
-  ``-i``
-for migration with incremental copy of disk (base image is shared)
 ERST
 
 {
diff --git a/migration/block.c b/migration/block.c
index bae6e94891..87ec1a7e68 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -419,7 +419,6 @@ static int init_blk_migration(QEMUFile *f, Error **errp)
 bmds->bulk_completed = 0;
 bmds->total_sectors = sectors;
 bmds->completed_sectors = 0;
-bmds->shared_base = migrate_block_incremental();
 
 assert(i < num_bs);
 bmds_bs[i].bmds = bmds;
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index b6b2035f64..8446c0721a 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -332,10 +332,6 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %u m

[PULL v2 03/13] migration: process_incoming_migration_co(): complete cleanup on failure

2024-05-08 Thread Fabiano Rosas
From: Vladimir Sementsov-Ogievskiy 

Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2dc6a063e9..0d26db47f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque)
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
-qemu_fclose(mis->from_src_file);
-
-multifd_recv_cleanup();
-compress_threads_load_cleanup();
+migration_incoming_state_destroy();
 
 exit(EXIT_FAILURE);
 }
-- 
2.35.3




  1   2   3   4   5   6   7   8   9   10   >