Le 12/02/2016 01:07, Tristan Cacqueray a écrit : > Switching to softwarefactory-dev. > > On 02/11/2016 09:45 PM, Fabien Boucher wrote: >> Le 11/02/2016 21:40, Tristan Cacqueray a écrit : >>> On 02/11/2016 04:51 PM, Fabien Boucher wrote: >>>> Here is my explanation :D >>>> >>>> This situation with the post pipeline to publish/build final artifact: >>>> ---------------------------------------------------------------------- >>>> >>>> 1. A change on nova-dist has been approved, a job (eg. packstack is >>>> running). The test succeed and the change >>>> "A" will be merged by zuul on rdo-liberty in the nova-distgit repo. >>>> >>>> Gate pipeline: >>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1) >>>> - packstack job >>>> >>>> Liberty repo: >>>> - nova_12.0.0.rpm >>>> - ceilometer_8.0.0.rpm >>>> >>>> 2. The change has been merged (in git), and a new change on >>>> ceilometer-distgit has been approved (entering the pipeline). >>>> In the same time the post pipeline run "an artifact export job" for nova >>>> >>>> Gate pipeline: >>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1) >>>> - packstack job >>>> >>>> Post pipeline: >>>> - nova-distgit HEAD >>>> - artifact export job (build non scratch on koji) >>>> >>>> Liberty repo: >>>> - nova_12.0.0.rpm >>>> - ceilometer_8.0.0.rpm >>>> >>>> -> The packstack job for ceilometer-distgit change is running. Packstack >>>> installation >>>> is fetching our packages from the liberty repo, and fetch nova_12.0.0.rpm. >>> >>> It seems like the real issue is to fetch from upstream repo during test. >>> Why can't we request koji-build based on all project git master instead >>> of the final repository ? Didn't you mentioned a problem with Koji and >>> scratch build of merged commit ? >> >> I don't think this is really realistic to rebuild the whole set of packages >> based on rdo-liberty branches for a test, it will take more the a while :) >> >> Fetching from upstream repo (the koji liberty repo we gate) during a test is >> easy we just configure yum to target it. The real issue is to have it up to >> date >> at the right moment. In fact the RPM need to be available in the repo when >> just before >> the tested change goes out from the gate pipeline (shared queue) and just >> before >> the change is merged in the Git repo/branch. >> >> About the problem you mention, I don't think. I mentioned an issue but it >> was related to an unexpected close of the communication channel by jenkins >> when the pkg-export job start a "non scratch" build against koji resulting >> of pkg publish and the related git change not merged on the repo :/ But it's >> not >> suppose to happen often :) I hope. >> >>> >>> >>>> Packstack has fetched nova 12.0.0 and is validating ceilometer along with >>>> it. >>>> >>>> 3. The post job "artifact export" succeed to build the artifact >>>> "build against koji" and nova_12.0.1.rpm lands in the repository. Nice. >>>> >>>> 4. The packstack job of ceilometer succeed. Nice, the post job for the >>>> ceilometer change >>>> starts and succeed then we have: >>>> >>>> Liberty repo: >>>> - nova_12.0.1.rpm >>>> - ceilometer_8.0.1.rpm >>>> >>>> /!\ What if the nova 12.0.1 introduced a change in the packaging, a new >>>> file, a patch, ... >>>> that prevent ceilometer package to be installed, or prevent ceilometer to >>>> work well ... >>>> Then this result in a broken RPM repository /!\ >>>> -> And why did this arrive ? because in the meantime the post job for >>>> nova-distgit was run >>>> we have validated a change on ceilometer testing it with the previous >>>> version of nova ! >>>> >>>> >>>> Furthermore what if the nova-distgit "artifact export job" failed to build >>>> nova 12.0.1 and we didn't noticed that. Other changes enters in the gate >>>> pipeline >>>> are validated with packstack (+ nova 12.0.0) and build to the final repo >>>> via the post job. >>>> Then we will, later, when we discover the inconsistency force the post >>>> pipeline to >>>> run for nova 12.0.1 without any tests. 12.0.1 will lands in liberty repo. >>>> Nice ! >>>> -> But are we sure 12.0.1 will work with other changes (new rpm that have >>>> landed) >>>> during the time of the inconsistency ? Maybe ... Maybe not :D >>>> In that case we can instead of running again the post job. We can then >>>> bump the >>>> nova to 12.0.1-1 ..., it will force tests to re-run with the last version >>>> of the >>>> liberty repo. Well but what to put in the changelog of 12.0.1-1 "force a >>>> rebuild" ? >>>> >>>> >>>> Another case: >>>> Gate pipeline: >>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1) >>>> - packstack job >>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1) >>>> - packstack job >>>> >>>> Liberty repo: >>>> - nova_12.0.0.rpm >>>> - ceilometer_8.0.0.rpm >>>> >>>> The packstack job for ceilometer-distgit will request a build (scratch) >>>> against koji >>>> of ceilometer but also thanks to Zuul it knows that a nova-distgit change >>>> is currently >>>> tested and may lands in the liberty repo so it will also request a build of >>>> nova 12.0.1 against koji. So locally it can build a repo containing >>>> nova_12.0.1.rpm >>>> and ceilometer_8.0.1.rpm. ceilometer change is tested with packstack with >>>> a good >>>> test environment. >>>> Then nova-distgit change succeed, the post job start and fail to build the >>>> final artifact >>>> "koji build". The change currently on top of the pipeline didn't noticed >>>> that ... (the fail occurred in the post pipeline) >>>> and succeed to validate ceilometer (along with nova 12.0.1), the post job >>>> starts and ceilometer 8.0.1 lands in >>>> the liberty repo. Nice ! what if the change to bump ceilometer to 8.0.1 >>>> was in fact unable to work with >>>> nova 12.0.0 ... then a broken liberty repo ! >>>> >>>> Furthermore if we want a post job (for publishing) then we need an >>>> additional node, static and >>>> usually with 1 jenkins worker. Indeed if we use nodepool to spawn node for >>>> the post pipeline >>>> or we use more that 1 executor then we cannot be sure RPM lands in the >>>> RPM repo serially ... and do we really want that someone checkout the >>>> liberty-repo >>>> with a RPM supposed to land before another, but this another has not >>>> landed yet ... >>>> >>>> ---- >>>> >>>> So if you reached this point cool \o/. Now if we take those previous >>>> examples and we think about >>>> start the final build on koji (non scratch build) inside a gate job. >>>> >>>> Gate pipeline: >>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1) >>>> - packstack job (SUCCEED) >>>> - artifact export job (build non scratch on koji) (RUNNING) >>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1) >>>> - packstack job (RUNNING) >>>> - artifact export job (build non scratch on koji) (RUNNING) >>>> >>>> * The ceilometer-distgit final RPM cannot land yet in the final liberty >>>> RPM repo >>>> because the nova-distgit has not landed yet in the liberty RPM repo. >>>> >>>> * if "artifact export job" for nova-distgit fails: >>>> - change is not merged >>>> - RPM won't land inside the RPM >>>> -> ceilometer-distgit jobs will restart (as a dependent change failed (all >>>> distgit projects share the same job)) >>>> and the packstack job for ceilometer will be tested with nova 12.0.0 >>>> (that is in the liberty RPM repo) >>>> So it will land in the RPM repo but it has been tested with the right >>>> version of nova RPM. >>>> >>>> ----> Remember with the post job to do the final build, at this time, we >>>> didn't know the status of the post job (fails) >>>> so the workflow has not figured out the export failed so ceilometer has >>>> been tested with nova 12.0.1 but nova 12.0.1 has not >>>> landed ... but ceilometer landed ... :/ >>>> >>>> >>>> "artifact export job" uses : >>>> https://github.com/redhat-cip/software-factory/blob/master/tools/slaves/wait_for_other_jobs.py >>>> and an example of it (for rpmfactory): >>>> https://github.com/redhat-cip/rpmfactory/blob/master/gating/pkg-export.sh >>>> >>>> >>>> ----- >>>> >>>> So let me know if building artifacts in the gate pipeline is relevant for >>>> you (at least for RPM Factory) ? >>>> >>> Well definitely, most RDO packages are tightly integrated and there is a >>> non-negotiable risk of breakage. >>> >>> >>>> Note in the RPM factory context we gate "a repository" ... so we use that >>>> repository to test other changes ... >>>> >>>> Now imagine if in zuul we configured the gating to not submit git changes >>>> ... and do it in the post pipeline ... >>>> do you think then Zuul is still valuable ? I'm not so sure. >>>> >>> >>> >>> Alright thank you Fabien for starting this discussion. >>> If I understand correctly the problem is really the window between a >>> gate success and a post-job success. Any changes that enter the gate >>> during that window won't be tested against changes that are still being >>> post-processed... Other wise both approach seems to be identical. >> >> Yes this is related to delay between the change goes out from the gate >> pipeline >> and the moment the post pipeline finish to exec the job. But not only >> because when the export job is executed in the gate pipeline, the gate >> pipeline knows the job status and then she is able to react to this failure: >> - Git change is not merge in the git repo >> - Following changes in the gate shared queue will stop their jobs >> and restart by skipping this broken change. >> If the export job is executed in the post pipeline none of the two facts >> above will happen ... >> >>> >>> Your initial proposition works for sure, but I question the need to have >>> such job that will wait to be at the tip of the gate to actually >>> SUCCESS. This seems to severely limit the capability of zuul and it's >>> speculatively merge based design... >> >> I don't get that, why will it limit this capability ? >> For me nothing change in this area but I can miss something, please clarify. >> > > I believe it limits zuul capability since the pkg_exports is serial for > the gate. If that pkg_exports task takes 1 hour, then we can't merge > more than 24 change per day.
You're right but I doubt the average time of building a package against koji take so long. For a change goes out the gate pipeline we need two "gate" conditions validated by Zuul: Gerrit merge the change in the branch + koji build and store the pkg. So yes it is longer than just the Gerrit merge. > Here are the pros and cons I understood so far, please correct me if I'm > wrong: > > Wait_for_other_jobs Pros: > * If publish fail, then the change isn't merge. > * Gate tests are fast because they use upstream packages * We decrease situation where a pkg tested in the gate is NOT tested against the future state of the RPM repo > > Cons: > * We lose the ability to merge change in parallel Note that I don't think Zuul merges changes in parallel. Merging change in a git repo is fast resulting to an impression of parallelism but actually I think it is serial. IMO we lose nothing in that field. > * This might lock the gate if all the resources are somehow allocated > to pkg_export tasks. (Since the job will wait for job stuck in queued) > * It Can be a source of random failure that will affect the gate > (e.g. when rdo mirror timeout) > > "Publish in post" Pros: > * Similar workflow of openstack-infra for publish-to-pypi job (which is > a big pro imo since it works) the big diff here is the artifact published on pypi is not re-used for the testing. Future changes entering in the gate don't need the artifact on pypi to be validated ... This is not the case for RDO in rpm factory. > > Cons: > * Post-job needs to be monitored when it fails. > * Test time may be longer. * More probable inconsistencies on the target RPM repo * Need an additional static node to run job serially in the Post pipeline > > > TL;DR; As already mentioned during the last sprint review, the > wait_for_other_jobs seems like the most trivial and effective way to > ensure RDO repository are fully validated. However since it's a risky > decision, I'd like we really consider other solution, at least to > demonstrate this is the superior approach. > > Perhaps we should discuss these solutions with the RDO folks too. > Basically wait_for_other_job will guarantee that the repos are stable, > with the cost of a special zuul gate job that may slow down development. > Otherwise we can publish in a post job, with the risk of having desync > between git and rpm repo. > Good summary :D > > Back to the original issue of not being able to build all RDO packages > for each change, how about instead of rebuilding only what is currently > in the gate, can't we rebuild what is also in the post pipeline ? Like > that we don't rebuild everything each time and we still have something > identical to what will be upstream. That may be another approach... > (Assuming a failing post job also stays in the post pipeline until it > succeed). Yes I agree I guess it can be an alternative solution with the cost of: - having a static node to maintain to run job in the post pipeline - we won't detect that we need a rebuild if Zuul currently pass a change from the gate to the post pipeline (quick delay but still to consider) - A job still need to introspect the pipelines (like wait_for_other_jobs.py) - A change is merged in the distgit but the package is not yet in the final RPM repo. -> I still prefer when both happen more *atomically* in the Gate :D Ok if we consider this solution as you said (by assuming a failing post job also stays in the post pipeline until it succeed) then I think we can have two solutions. > Regards, > -Tristan _______________________________________________ Softwarefactory-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/softwarefactory-dev
