Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts

Cleber Rosa Fri, 04 Sep 2020 08:10:51 -0700

On Fri, Sep 04, 2020 at 09:18:16AM +0100, Daniel P. Berrangé wrote:
> On Thu, Sep 03, 2020 at 08:11:39PM -0400, Cleber Rosa wrote:
> > On Thu, Jul 09, 2020 at 11:30:29AM +0100, Daniel P. Berrangé wrote:
> > > On Wed, Jul 08, 2020 at 10:46:57PM -0400, Cleber Rosa wrote:
> > > > This is a mapping of Peter's "remake-merge-builds" and
> > > > "pull-buildtest" scripts, gone through some updates, adding some build
> > > > option and removing others.
> > > > 
> > > > The jobs currently cover the machines that the QEMU project owns, and 
> > > > that
> > > > are setup and ready to run jobs:
> > > > 
> > > >  - Ubuntu 18.04 on S390x
> > > >  - Ubuntu 20.04 on aarch64
> > > > 
> > > > During the development of this set of jobs, the GitLab CI was tested
> > > > with many other architectures, including ppc64, s390x and aarch64,
> > > > along with the other OSs (not included here):
> > > > 
> > > >  - Fedora 30
> > > >  - FreeBSD 12.1
> > > > 
> > > > More information can be found in the documentation itself.
> > > > 
> > > > Signed-off-by: Cleber Rosa <cr...@redhat.com>
> > > > ---
> > > >  .gitlab-ci.d/gating.yml                | 146 +++++++++++++++++
> > > 
> > > AFAIK, the jobs in this file just augment what is already defined
> > > in the main .gitlab-ci.yml. Also since we're providing setup info
> > > for other people to configure custom runners, these jobs are usable
> > > for non-gating CI scenarios too.
> > >
> > 
> > If you mean that they introduced new jobs, you're right.
> > 
> > > IOW, the jobs in this file happen to be usable for gating, but they
> > > are not the only gating jobs, and can be used for non-gating reasons.
> > >
> > 
> > Right, I do not doubt these jobs may be useful to other people and on
> > scenarios other than "before merging a patch series".
> > 
> > > This is a complicated way of saying that gating.yml is not a desirable
> > > filename, so I'd suggest splitting it in two and having these files
> > > named based on what their contents is, rather than their use case:
> > > 
> > >    .gitlab-ci.d/runners-s390x.yml
> > >    .gitlab-ci.d/runners-aarch64.yml
> > > 
> > > The existing jobs in .gitlab-ci.yml could possibly be moved into
> > > a .gitlab-ci.d/runners-shared.yml file for consistency.
> > >
> > 
> > Do you imply that every gitlab CI job should be a gating job?  And
> > that the same jobs should be used when other people with their own
> > forks?  I find this problematic because:
> > 
> > * It would trigger pipelines with jobs that, unless every user has the
> >   same runners configured, would have unfulfilled jobs that don't have
> >   a matching hardware.
> 
> Jobs that require a custom runner should not be set to run by default,
> but individual contributors must absolutely be able to opt-in to running
> those jobs simply by registering a runner on their account.
>


Agreed, and that's why they have been put into this diffent "gating"
class here.

> > * It dilutes the idea that those jobs are inherently different with
> >   regards to the management of their infrastructure.
> 
> I don't really know what yiu mean here, but "Inherantly different"
> does not sound like a desirable property.
>

Organizations and individuals will have responsibility over the
infrastructure they choose to add, which is "inherently different"
from the gitlab shared machines.  Not sure there's a way around it.

> > * It destroys the notion of layered testing, for whatever people find
> >   that worth it, where a faster turnaround could/would be possible
> >   with fewer jobs for every push, and many more jobs before a merge.
> 
> The key goal of CI is to reduce the burden on maintainers. The biggest
> cost is if we merge code and failure is noticed after merge. IT is
> still a large cost, however, if Peter only finds a CI failure when he
> attempts the pre-merge test. He has to throw out the pull request
> putting more work on the subsystem maintainer. The subsystem maintainer
> may have to throw it back to the original author.
> 
> The ideal scenario that we need to strive towards is that the original
> author has tested their code with 100% coverage of all the CI jobs QMEU
> has defined.
>

I agree... but it's also unrealistic at this point, right?  For
instance, do we have s390x boxes to run all of those?  Avocado has
been using Travis CI for s390x/ppc64/aarch64, and those are quite
unreliable even with a load many orders of magnitude smaller then the
QEMU project.  So, resources are needed to have this flat, 100%
coverage, "ideal scenario" you describe.

> Any time there is a job that is not run by authors, but only by the
> maintainers, we are putting increased burden on the maintainers, so
> must be minimize that.
>

I agree.  But if resources are limited, then should the testing scope
be decresead so that it's equalized?

> IOW, layered testing is not desirable as goal. Rather layered testing
> is just a default setup, but we'd encourage contributors to run the
> full set of CI jobs, especially if they are frequent contributors.
> The more they run themselves, the less burden on subsystem maintainers
> and Peter, and thus the better we all scale.
>

We agree on goals, we don't agree on the strategy though.

> > Finally, I find the split by runner architecture you suggested
> > problematic because different organizations may have jobs for the same
> > architecture.  I believe that files for different organizations may be
> > a better organization instead.  Entries in the MAINTAINERS are one
> > example where the grouping by architecture may not be optimal.
> 
> I don't think we should be structuring jobs around organizations. We
> should be defining a set of desired jobs we wish to be able to run.
> Any organization can bring a runner that is capable of running the
> jobs and donate it to the QEMU project for our formal CI runner
> The organization is not defining the job though - QEMU is  defining
> the jobs we expect to have used for testing.
>

This was disscussed previously[1].

> This is key because any contributor needs to be able to spin up an
> identical envrionment to replicate any build failures. We don't want
> runners for merge testing that are built as a blackbox by someone.
> That is the single biggest painpoint with Peter's current merge
> jobs - we can't easily replicate Peter's merge env even if we had
> the matching hardware available.
>

With the right automation, such as the playbooks introduced here, any
person with the same hardware should have an environment to replicate
a job and debug and issue.

[1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html

Best regards,
- Cleber.

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

signature.asc
Description: PGP signature

Re: [PATCH v2 2/2] GitLab Gating CI: initial set of jobs, documentation and scripts

Reply via email to