Bogdan, I think before final decisions we need to know exactly - what a price we need to pay? Without exact numbers it will be difficult to discuss about. I we need to wait 80 mins of undercloud-containers job to finish for starting all other jobs, it will be about 4.5 hours to wait for result (+ 4.5 hours in gate) which is too big price imho and doesn't worth an effort.
What are exact numbers we are talking about? Thanks On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <bdobr...@redhat.com> wrote: > Let me clarify the problem I want to solve with pipelines. > > It is getting *hard* to develop things and move patches to the Happy End > (merged): > - Patches wait too long for CI jobs to start. It should be minutes and not > hours of waiting. > - If a patch fails a job w/o a good reason, the consequent recheck > operation repeat waiting all over again. > > How pipelines may help solve it? > Pipelines only alleviate, not solve the problem of waiting. We only want > to build pipelines for the main zuul check process, omitting gating and RDO > CI (for now). > > Where are two cases to consider: > - A patch succeeds all checks > - A patch fails a check with dependencies > > The latter cases benefit us the most, when pipelines are designed like it > is proposed here. So that any jobs expected to fail, when a dependency > fails, will be omitted from execution. This saves HW resources and zuul > queue places a lot, making it available for other patches and allowing > those to have CI jobs started faster (less waiting!). When we have "recheck > storms", like because of some known intermittent side issue, that outcome > is multiplied by the recheck storm um... level, and delivers even better > and absolutely amazing results :) Zuul queue will not be growing insanely > getting overwhelmed by multiple clones of the rechecked jobs highly likely > deemed to fail, and blocking other patches what might have chances to pass > checks as non-affected by that intermittent issue. > > And for the first case, when a patch succeeds, it takes some extended > time, and that is the price to pay. How much time it takes to finish in a > pipeline fully depends on implementation. > > The effectiveness could only be measured with numbers extracted from > elastic search data, like average time to wait for a job to start, success > vs fail execution time percentiles for a job, average amount of rechecks, > recheck storms history et al. I don't have that data and don't know how to > get it. Any help with that is very appreciated and could really help to > move the proposed patches forward or decline it. And we could then compare > "before" and "after" as well. > > I hope that explains the problem scope and the methodology to address that. > > > On 5/14/18 6:15 PM, Bogdan Dobrelya wrote: > >> An update for your review please folks >> >> Bogdan Dobrelya <bdobreli at redhat.com> writes: >>> >>> Hello. >>>> As Zuul documentation [0] explains, the names "check", "gate", and >>>> "post" may be altered for more advanced pipelines. Is it doable to >>>> introduce, for particular openstack projects, multiple check >>>> stages/steps as check-1, check-2 and so on? And is it possible to make >>>> the consequent steps reusing environments from the previous steps >>>> finished with? >>>> >>>> Narrowing down to tripleo CI scope, the problem I'd want we to solve >>>> with this "virtual RFE", and using such multi-staged check pipelines, >>>> is reducing (ideally, de-duplicating) some of the common steps for >>>> existing CI jobs. >>>> >>> >>> What you're describing sounds more like a job graph within a pipeline. >>> See: https://docs.openstack.org/infra/zuul/user/config.html#attr- >>> job.dependencies >>> for how to configure a job to run only after another job has completed. >>> There is also a facility to pass data between such jobs. >>> >>> ... (skipped) ... >>> >>> Creating a job graph to have one job use the results of the previous job >>> can make sense in a lot of cases. It doesn't always save *time* >>> however. >>> >>> It's worth noting that in OpenStack's Zuul, we have made an explicit >>> choice not to have long-running integration jobs depend on shorter pep8 >>> or tox jobs, and that's because we value developer time more than CPU >>> time. We would rather run all of the tests and return all of the >>> results so a developer can fix all of the errors as quickly as possible, >>> rather than forcing an iterative workflow where they have to fix all the >>> whitespace issues before the CI system will tell them which actual tests >>> broke. >>> >>> -Jim >>> >> >> I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for >> undercloud deployments vs upgrades testing (and some more). Given that >> those undercloud jobs have not so high fail rates though, I think Emilien >> is right in his comments and those would buy us nothing. >> >> From the other side, what do you think folks of making the >> tripleo-ci-centos-7-3nodes-multinode depend on >> tripleo-ci-centos-7-containers-multinode [2]? The former seems quite >> faily and long running, and is non-voting. It deploys (see featuresets >> configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, >> when the containers-multinode fails - see the CI stats page [4]. I've found >> only a 2 cases there for the otherwise situation, when containers-multinode >> fails, but 3nodes-multinode passes. So cutting off those future failures >> via the dependency added, *would* buy us something and allow other jobs to >> wait less to commence, by a reasonable price of somewhat extended time of >> the main zuul pipeline. I think it makes sense and that extended CI time >> will not overhead the RDO CI execution times so much to become a problem. >> WDYT? >> >> [0] https://review.openstack.org/#/c/568275/ >> [1] https://review.openstack.org/#/c/568278/ >> [2] https://review.openstack.org/#/c/568326/ >> [3] https://docs.openstack.org/tripleo-quickstart/latest/feature >> -configuration.html >> [4] http://tripleo.org/cistatus.html >> >> * ignore the column 1, it's obsolete, all CI jobs now using configs >> download AFAICT... >> >> > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Best regards Sagi Shnaidman
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev