On Mon, Mar 14, 2022 at 06:20:54PM +0000, Dr. David Alan Gilbert wrote: > * Peter Maydell (peter.mayd...@linaro.org) wrote: > > On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert > > <dgilb...@redhat.com> wrote: > > > > > > Peter Maydell (peter.mayd...@linaro.org) wrote: > > > > One thing that makes this bug investigation trickier, incidentally, > > > > is that the migration-test code seems to depend on userfaultfd. > > > > That means you can't run it under 'rr'. > > > > > > That should only be the postcopy tests; the others shouldn't use that. > > > > tests/qtest/migration-test.c:main() exits immediately without adding > > any of the test cases if ufd_version_check() fails, so no userfaultfd > > means no tests run at all, currently. > > Ouch! I could swear we had a fix for that. > > Anyway, it would be really good to see what migrate-query was returning; > if it's stuck in running or cancelling then it's a problem with multifd > that needs to learn to let go if someone is trying to cancel. > If it's failed or similar then the test needs fixing to not lockup.
This patch of mine may well be helpful: https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03192.html when debugging my TLS tests various mistakes meant I ended up with a failed session, but the test was spinning forever on 'query-migrate'. It was waiting for it to finish one iteration, and never bothering to validate that the reported status == active. If that patch was merged, it might well cause the test to abort in an assertion rather than spining forever, if status == failed. Of course someone would still need to find out why it failed, but none the less, I think assert is nicer than spin forever. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|