Re: flakiness on CI jobs run via k8s

Daniel P . Berrangé Wed, 18 Sep 2024 02:25:23 -0700

On Tue, Sep 17, 2024 at 04:48:45PM +0100, Peter Maydell wrote:
> I notice that a lot of the CI job flakiness I'm seeing with main
> CI runs involves jobs that are run via the k8s runners. Notably
> cross-i686-tci and cross-i686-system and cross-i686-user are like this.
> These jobs run with no flakiness that I've noticed when they're run
> by an individual gitlab user (in which case they're not running on
> k8s, I believe). So something seems to be up with the environment
> we're using to run the jobs for the main CI. My impression is that
> the time things take to run can be very variable, especially if the
> CI job believes the reported number of CPUs and actually tries to run
> 8 or 9 test cases in parallel.
> 
> Any ideas what might be causing issues here, or config tweaks
> we might be able to make to ensure that the environment reports
> to the CI job a number of CPUs/etc that accurately reflects
> the amount of resource it really has?


Didn't we change the hosting for our k8s runners recently ? They were
running on Azure, but I vaguely recall hearing that it was being
switched again.

Anyway, perhaps the cloud provider is over-committing the env such
that we have excessive streal time and thus not getting the full
power of the CPUs we expect.  I know gitlab's own public runners
will suffer from this periodically, due to the very cheap VMs they
host on.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: flakiness on CI jobs run via k8s

Reply via email to