On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
> On 19/02/2024 07.17, Nicholas Piggin wrote:
> > The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> > aligned ranges forgot to bring the TCG TLB up to date after clearing
> > some of the dirty memory bitmap bits. This can result in stores though
> > the TCG TLB not setting the dirty memory bitmap and ultimately causes
> > memory corruption / lost updates during migration from a TCG host.
> > 
> > Fix this by exporting an abstracted function to call when dirty bits
> > have been cleared.
> > 
> > Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> > Signed-off-by: Nicholas Piggin <npig...@gmail.com>
> > ---
>
> Sounds promising! ... but it doesn't seem to fix the migration-test qtest 
> with s390x when it gets enabled again:

Did it fix kvm-unit-tests for you?

> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
>           return g_test_run();
>       }
>
> -    /*
> -     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
> -     * there until the problems are resolved
> -     */
> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> -        g_test_message("Skipping test: s390x host with KVM is required");
> -        return g_test_run();
> -    }
> -
>       tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
>       if (!tmpfs) {
>           g_test_message("Can't create temporary directory in %s: %s",
>
> I wonder whether there is more stuff like this necessary somewhere?

Possibly. That's what the commit logs for the TCG disable indicate. I
have found another dirty bitmap TCG race too. I'll send it out after
some more testing.

> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to 
> see whether that works fine now?

Hmm, I did try and so far ppc64 is not failing even with upstream QEMU.
I'll try with s390x. Any additional build or runtime options to make it
break? How long does it take for breakage to be evident?

Thanks,
Nick

Reply via email to