Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Bogdan Dobrelya Tue, 15 May 2018 05:09:38 -0700

Let me clarify the problem I want to solve with pipelines.

It is getting *hard* to develop things and move patches to the Happy End(merged):- Patches wait too long for CI jobs to start. It should be minutes andnot hours of waiting.- If a patch fails a job w/o a good reason, the consequent recheckoperation repeat waiting all over again.


How pipelines may help solve it?

Pipelines only alleviate, not solve the problem of waiting. We only wantto build pipelines for the main zuul check process, omitting gating andRDO CI (for now).


Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed likeit is proposed here. So that any jobs expected to fail, when adependency fails, will be omitted from execution. This saves HWresources and zuul queue places a lot, making it available for otherpatches and allowing those to have CI jobs started faster (lesswaiting!). When we have "recheck storms", like because of some knownintermittent side issue, that outcome is multiplied by the recheck stormum... level, and delivers even better and absolutely amazing results :)Zuul queue will not be growing insanely getting overwhelmed by multipleclones of the rechecked jobs highly likely deemed to fail, and blockingother patches what might have chances to pass checks as non-affected bythat intermittent issue.

And for the first case, when a patch succeeds, it takes some extendedtime, and that is the price to pay. How much time it takes to finish ina pipeline fully depends on implementation.

The effectiveness could only be measured with numbers extracted fromelastic search data, like average time to wait for a job to start,success vs fail execution time percentiles for a job, average amount ofrechecks, recheck storms history et al. I don't have that data and don'tknow how to get it. Any help with that is very appreciated and couldreally help to move the proposed patches forward or decline it. And wecould then compare "before" and "after" as well.


I hope that explains the problem scope and the methodology to address that.

On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:

An update for your review please folks
Bogdan Dobrelya <bdobreli at redhat.com> writes:
Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.
What you're describing sounds more like a job graph within a pipeline.
See:https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim
I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines forundercloud deployments vs upgrades testing (and some more). Given thatthose undercloud jobs have not so high fail rates though, I thinkEmilien is right in his comments and those would buy us nothing.
 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend ontripleo-ci-centos-7-containers-multinode [2]? The former seems quitefaily and long running, and is non-voting. It deploys (see featuresetsconfigs [3]*) a 3 nodes in HA fashion. And it seems almost neverpassing, when the containers-multinode fails - see the CI stats page[4]. I've found only a 2 cases there for the otherwise situation, whencontainers-multinode fails, but 3nodes-multinode passes. So cutting offthose future failures via the dependency added, *would* buy us somethingand allow other jobs to wait less to commence, by a reasonable price ofsomewhat extended time of the main zuul pipeline. I think it makes senseand that extended CI time will not overhead the RDO CI execution timesso much to become a problem. WDYT?
[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3]https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
[4] http://tripleo.org/cistatus.html
* ignore the column 1, it's obsolete, all CI jobs now using configsdownload AFAICT...



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

Reply via email to