On Mon, Jun 05, 2023 at 04:07:46PM +0200, Thomas Huth wrote: > On 01/06/2023 18.31, Daniel P. Berrangé wrote: > > Perhaps the most painful of all the GitLab CI failures we see are > > the enforced job timeouts: > > > > "ERROR: Job failed: execution took longer than 1h15m0s seconds" > > > > https://gitlab.com/qemu-project/qemu/-/jobs/4387047648 > > > > when that hits the CI log shows what has *already* run, but figuring > > out what was currently running (or rather stuck) is an horrendously > > difficult. > > > > The initial meson port disabled the meson test timeouts, in order to > > limit the scope for introducing side effects from the port that would > > complicate adoption. > > > > Now that the meson port is basically finished we can take advantage of > > more of its improved features. It has the ability to set timeouts for > > test programs, defaulting to 30 seconds, but overridable per test. This > > is further helped by fact that we changed the iotests integration so > > that each iotests was a distinct meson test, instead of having one > > single giant (slow) test. > > > > We already set overrides for a bunch of tests, but they've not been > > kept up2date since we had timeouts disabled. So this series first > > updates the timeout overrides such that all tests pass when run in > > my test gitlab CI pipeline. Then it enables use of meson timeouts. > > > > We might still hit timeouts due to non-deterministic performance of > > gitlab CI runners. So we'll probably have to increase a few more > > timeouts in the short term. Fortunately this is going to be massively > > easier to diagnose. For example this job during my testing: > > > > https://gitlab.com/berrange/qemu/-/jobs/4392029495 > > > > we can immediately see the problem tests > > > > Summary of Failures: > > 6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test > > TIMEOUT 120.02s killed by signal 15 SIGTERM > > 7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test > > TIMEOUT 120.03s killed by signal 15 SIGTERM > > 64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test > > TIMEOUT 300.03s killed by signal 15 SIGTERM > > > > The full meson testlog.txt will show each individual TAP log output, > > so we can then see exactly which test case we got stuck on. > > > > NB, the artifacts are missing on the job links above, until this > > patch merges: > > > > https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html > > > > NB, this series sets the migration-test timeout to 5 minutes, which > > is only valid if this series is merged to make the migration test > > not suck: > > > > https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html > > > > without that series, we'll need to set the migration-test timeout to > > 30 minutes instead. > > > > Daniel P. Berrangé (6): > > qtest: bump min meson timeout to 60 seconds > > qtest: bump migration-test timeout to 5 minutes > > qtest: bump qom-test timeout to 7 minutes > > qtest: bump aspeed_smc-test timeout to 2 minutes > > qtest: bump bios-table-test timeout to 6 minutes > > mtest2make: stop disabling meson test timeouts > > > > scripts/mtest2make.py | 3 ++- > > tests/qtest/meson.build | 16 ++++++---------- > > 2 files changed, 8 insertions(+), 11 deletions(-) > > FWIW, I now ran this on my rather old laptop with an --enable-debug > build with "make -j$(nproc) check-qtest" and got these additional > failures (beside the expected migration-test that still needs its > final speedup): > > qtest-aarch64/test-hmp TIMEOUT 120.07s killed by signal 15 SIGTERM > qtest-aarch64/qom-test TIMEOUT 420.09s killed by signal 15 SIGTERM > qtest-arm/qom-test TIMEOUT 420.10s killed by signal 15 SIGTERM > qtest-arm/npcm7xx_pwm-test TIMEOUT 150.04s killed by signal 15 SIGTERM > qtest-ppc64/pxe-test TIMEOUT 60.01s killed by signal 15 SIGTERM > qtest-sparc/prom-env-test TIMEOUT 60.01s killed by signal 15 SIGTERM > qtest-sparc/boot-serial-test TIMEOUT 60.01s killed by signal 15 SIGTERM
Did you see any others in the 45-60 second time window, as those would be candidates for increases too - don't want to have things right below the 60 second cutoff ? > When I run them manually without the timeout patch, I get these > values: > > qtest-aarch64/test-hmp OK 168.66s 95 subtests passed > qtest-aarch64/qom-test OK 646.37s 94 subtests passed > qtest-arm/qom-test OK 621.64s 89 subtests passed > qtest-arm/npcm7xx_pwm-test OK 225.48s 24 subtests passed > qtest-ppc64/pxe-test OK 96.95s 2 subtests passed > qtest-sparc/prom-env-test OK 95.94s 3 subtests passed > qtest-sparc/boot-serial-test OK 92.96s 3 subtests passed > > HTH, > Thomas > With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|