* Daniel P. Berrangé (berra...@redhat.com) wrote: > On Mon, Aug 08, 2022 at 02:43:49PM +0200, Thomas Huth wrote: > > On 08/08/2022 14.14, Daniel P. Berrangé wrote: > > > On Mon, Aug 08, 2022 at 01:57:17PM +0200, Thomas Huth wrote: > > > > > > > > Hi! > > > > > > > > Seems like we're getting more timeouts in the CI pipelines since commit > > > > 2649a72555e ("Allow test to run without uffd") enabled the migration > > > > tests > > > > in more scenarios. > > > > > > > > For example: > > > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/2821578332#L49 > > > > > > > > You can see that the migration-test ran for more than 20 minutes for > > > > each > > > > target (x86 and aarch64)! I think that's way too much by default. > > > > > > Definitely too much. > > > > > > > I had a check whether there is one subtest taking a lot of time, but it > > > > rather seems like each of the migration test is taking 40 to 50 seconds > > > > in > > > > the CI: > > > > > > > > https://gitlab.com/thuth/qemu/-/jobs/2825365836#L44 > > > > > > Normally with CI we expect a constant slowdown factor, eg x2. > > > > > > I expect with migration though, we're triggering behaviour whereby > > > the guest workload is generating dirty pages quicker than we can > > > migrate them over localhost. The balance in this can quickly tip > > > to create an exponential slowdown. > > > > If I run the aarch64 migration-test on my otherwise idle x86 laptop, it also > > takes already ca. 460 seconds to finish, which is IMHO also already too much > > for a normal "make check" run (without SPEED=slow). > > > > > I'm not sure if 'g_test_slow' gives us enough granularity though, as > > > if we enable that, it'll impact the whole test suite, not just > > > migration tests. > > > > We could also check for the GITLAB_CI environment variable, just like we > > already do it in some of the avocado-based tests ... but given the fact that > > the migration test is already very slow on my normal x86 laptop, I think I'd > > prefer if we added some checks with g_test_slow() in there ... > > > > Are there any tests in migration-test.c that are rather redundant and could > > be easily skipped in quick mode? > > The trouble with migration is that there are alot of subtle permutations > that interact in wierd ways, so we've got alot of test scenarios, includuing > many with TLS: > > /x86_64/migration/bad_dest > /x86_64/migration/fd_proto > /x86_64/migration/validate_uuid > /x86_64/migration/validate_uuid_error > /x86_64/migration/validate_uuid_src_not_set > /x86_64/migration/validate_uuid_dst_not_set > /x86_64/migration/auto_converge > /x86_64/migration/dirty_ring > /x86_64/migration/vcpu_dirty_limit > /x86_64/migration/postcopy/unix > /x86_64/migration/postcopy/plain > /x86_64/migration/postcopy/recovery/plain > /x86_64/migration/postcopy/recovery/tls/psk > /x86_64/migration/postcopy/preempt/plain > /x86_64/migration/postcopy/preempt/recovery/plain > /x86_64/migration/postcopy/preempt/recovery/tls/psk > /x86_64/migration/postcopy/preempt/tls/psk > /x86_64/migration/postcopy/tls/psk > /x86_64/migration/precopy/unix/plain > /x86_64/migration/precopy/unix/xbzrle > /x86_64/migration/precopy/unix/tls/psk > /x86_64/migration/precopy/unix/tls/x509/default-host > /x86_64/migration/precopy/unix/tls/x509/override-host > /x86_64/migration/precopy/tcp/plain > /x86_64/migration/precopy/tcp/tls/psk/match > /x86_64/migration/precopy/tcp/tls/psk/mismatch > /x86_64/migration/precopy/tcp/tls/x509/default-host > /x86_64/migration/precopy/tcp/tls/x509/override-host > /x86_64/migration/precopy/tcp/tls/x509/mismatch-host > /x86_64/migration/precopy/tcp/tls/x509/friendly-client > /x86_64/migration/precopy/tcp/tls/x509/hostile-client > /x86_64/migration/precopy/tcp/tls/x509/allow-anon-client > /x86_64/migration/precopy/tcp/tls/x509/reject-anon-client > /x86_64/migration/multifd/tcp/plain/none > /x86_64/migration/multifd/tcp/plain/cancel > /x86_64/migration/multifd/tcp/plain/zlib > /x86_64/migration/multifd/tcp/plain/zstd > /x86_64/migration/multifd/tcp/tls/psk/match > /x86_64/migration/multifd/tcp/tls/psk/mismatch > /x86_64/migration/multifd/tcp/tls/x509/default-host > /x86_64/migration/multifd/tcp/tls/x509/override-host > /x86_64/migration/multifd/tcp/tls/x509/mismatch-host > /x86_64/migration/multifd/tcp/tls/x509/allow-anon-client > /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client > > Each takes about 4 seconds, except for the xbzrle, autoconverge and > vcpu-dirty-rate tests which take 8-12 seconds. > > We could short-circuit most of the tls tests, because 90% of what > they're validating is the initial connection setup phase. We don't > really need to run the full migration to completion, we can just > abort once we're running. Just keep 3 doing the full migration > to completion - one precopy, one postcopy and one multifd.
I'd rather we combined some than cutting stuff off; I was about to suggest doing zlib with some of the TLS but then that wouldn't have found the recent zlib one! Dave > That'd cut most of thte TLS tests from 4 seconds to 0.5 seconds. > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK