On Tue, Sep 17, 2024 at 04:48:45PM +0100, Peter Maydell wrote: > I notice that a lot of the CI job flakiness I'm seeing with main > CI runs involves jobs that are run via the k8s runners. Notably > cross-i686-tci and cross-i686-system and cross-i686-user are like this. > These jobs run with no flakiness that I've noticed when they're run > by an individual gitlab user (in which case they're not running on > k8s, I believe). So something seems to be up with the environment > we're using to run the jobs for the main CI. My impression is that > the time things take to run can be very variable, especially if the > CI job believes the reported number of CPUs and actually tries to run > 8 or 9 test cases in parallel. > > Any ideas what might be causing issues here, or config tweaks > we might be able to make to ensure that the environment reports > to the CI job a number of CPUs/etc that accurately reflects > the amount of resource it really has?
Didn't we change the hosting for our k8s runners recently ? They were running on Azure, but I vaguely recall hearing that it was being switched again. Anyway, perhaps the cloud provider is over-committing the env such that we have excessive streal time and thus not getting the full power of the CPUs we expect. I know gitlab's own public runners will suffer from this periodically, due to the very cheap VMs they host on. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|