Re: [Softwarefactory-dev] Artifact export for RPM factory in the Gate pipeline or the POST pipeline ?

Fabien Boucher Fri, 12 Feb 2016 00:59:36 -0800


Le 12/02/2016 01:07, Tristan Cacqueray a écrit :
> Switching to softwarefactory-dev.
> 
> On 02/11/2016 09:45 PM, Fabien Boucher wrote:
>> Le 11/02/2016 21:40, Tristan Cacqueray a écrit :
>>> On 02/11/2016 04:51 PM, Fabien Boucher wrote:
>>>> Here is my explanation :D
>>>>
>>>> This situation with the post pipeline to publish/build final artifact:
>>>> ----------------------------------------------------------------------
>>>>
>>>> 1. A change on nova-dist has been approved, a job (eg. packstack is 
>>>> running). The test succeed and the change
>>>> "A" will be merged by zuul on rdo-liberty in the nova-distgit repo.
>>>>
>>>> Gate pipeline:
>>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1)
>>>>   - packstack job
>>>>
>>>> Liberty repo:
>>>> - nova_12.0.0.rpm
>>>> - ceilometer_8.0.0.rpm
>>>>
>>>> 2. The change has been merged (in git), and a new change on 
>>>> ceilometer-distgit has been approved (entering the pipeline).
>>>> In the same time the post pipeline run "an artifact export job" for nova
>>>>
>>>> Gate pipeline:
>>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1)
>>>>   - packstack job
>>>>
>>>> Post pipeline:
>>>> - nova-distgit HEAD
>>>>   - artifact export job (build non scratch on koji)
>>>>
>>>> Liberty repo:
>>>> - nova_12.0.0.rpm
>>>> - ceilometer_8.0.0.rpm
>>>>
>>>> -> The packstack job for ceilometer-distgit change is running. Packstack 
>>>> installation
>>>> is fetching our packages from the liberty repo, and fetch nova_12.0.0.rpm.
>>>
>>> It seems like the real issue is to fetch from upstream repo during test.
>>> Why can't we request koji-build based on all project git master instead
>>> of the final repository ? Didn't you mentioned a problem with Koji and
>>> scratch build of merged commit ?
>>
>> I don't think this is really realistic to rebuild the whole set of packages
>> based on rdo-liberty branches for a test, it will take more the a while :)
>>
>> Fetching from upstream repo (the koji liberty repo we gate) during a test is
>> easy we just configure yum to target it. The real issue is to have it up to 
>> date
>> at the right moment. In fact the RPM need to be available in the repo when 
>> just before
>> the tested change goes out from the gate pipeline (shared queue) and just 
>> before
>> the change is merged in the Git repo/branch.
>>
>> About the problem you mention, I don't think. I mentioned an issue but it
>> was related to an unexpected close of the communication channel by jenkins
>> when the pkg-export job start a "non scratch" build against koji resulting
>> of pkg publish and the related git change not merged on the repo :/ But it's 
>> not
>> suppose to happen often :) I hope. 
>>
>>>
>>>
>>>> Packstack has fetched nova 12.0.0 and is validating ceilometer along with 
>>>> it.
>>>>
>>>> 3. The post job "artifact export" succeed to build the artifact
>>>> "build against koji" and nova_12.0.1.rpm lands in the repository. Nice.
>>>>
>>>> 4. The packstack job of ceilometer succeed. Nice, the post job for the 
>>>> ceilometer change
>>>> starts and succeed then we have:
>>>>
>>>> Liberty repo:
>>>> - nova_12.0.1.rpm
>>>> - ceilometer_8.0.1.rpm
>>>>
>>>> /!\ What if the nova 12.0.1 introduced a change in the packaging, a new 
>>>> file, a patch, ...
>>>> that prevent ceilometer package to be installed, or prevent ceilometer to 
>>>> work well ...
>>>> Then this result in a broken RPM repository /!\ 
>>>> -> And why did this arrive ? because in the meantime the post job for 
>>>> nova-distgit was run
>>>> we have validated a change on ceilometer testing it with the previous 
>>>> version of nova !
>>>>
>>>>
>>>> Furthermore what if the nova-distgit "artifact export job" failed to build
>>>> nova 12.0.1 and we didn't noticed that. Other changes enters in the gate 
>>>> pipeline
>>>> are validated with packstack (+ nova 12.0.0) and build to the final repo 
>>>> via the post job.
>>>> Then we will, later, when we discover the inconsistency force the post 
>>>> pipeline to
>>>> run for nova 12.0.1 without any tests. 12.0.1 will lands in liberty repo. 
>>>> Nice !
>>>> -> But are we sure 12.0.1 will work with other changes (new rpm that have 
>>>> landed)
>>>> during the time of the inconsistency ? Maybe ... Maybe not :D
>>>> In that case we can instead of running again the post job. We can then 
>>>> bump the
>>>> nova to 12.0.1-1 ..., it will force tests to re-run with the last version 
>>>> of the
>>>> liberty repo. Well but what to put in the changelog of 12.0.1-1 "force a 
>>>> rebuild" ?
>>>>
>>>>
>>>> Another case:
>>>> Gate pipeline:
>>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1)
>>>>   - packstack job
>>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1)
>>>>   - packstack job
>>>>
>>>> Liberty repo:
>>>> - nova_12.0.0.rpm
>>>> - ceilometer_8.0.0.rpm
>>>>
>>>> The packstack job for ceilometer-distgit will request a build (scratch) 
>>>> against koji
>>>> of ceilometer but also thanks to Zuul it knows that a nova-distgit change 
>>>> is currently
>>>> tested and may lands in the liberty repo so it will also request a build of
>>>> nova 12.0.1 against koji. So locally it can build a repo containing 
>>>> nova_12.0.1.rpm
>>>> and ceilometer_8.0.1.rpm. ceilometer change is tested with packstack with 
>>>> a good
>>>> test environment.
>>>> Then nova-distgit change succeed, the post job start and fail to build the 
>>>> final artifact
>>>> "koji build". The change currently on top of the pipeline didn't noticed 
>>>> that ... (the fail occurred in the post pipeline)
>>>> and succeed to validate ceilometer (along with nova 12.0.1), the post job 
>>>> starts and ceilometer 8.0.1 lands in
>>>> the liberty repo. Nice ! what if the change to bump ceilometer to 8.0.1 
>>>> was in fact unable to work with
>>>> nova 12.0.0 ... then a broken liberty repo !
>>>>
>>>> Furthermore if we want a post job (for publishing) then we need an 
>>>> additional node, static and
>>>> usually with 1 jenkins worker. Indeed if we use nodepool to spawn node for 
>>>> the post pipeline
>>>> or we use more that 1 executor then we cannot be sure RPM lands in the 
>>>> RPM repo serially ... and do we really want that someone checkout the 
>>>> liberty-repo
>>>> with a RPM supposed to land before another, but this another has not 
>>>> landed yet ...
>>>>
>>>> ----  
>>>>
>>>> So if you reached this point cool \o/. Now if we take those previous 
>>>> examples and we think about
>>>> start the final build on koji (non scratch build) inside a gate job.
>>>>
>>>> Gate pipeline:
>>>> - nova-distgit (rdo-liberty) change A (bump version to 12.0.1)
>>>>   - packstack job (SUCCEED)
>>>>   - artifact export job (build non scratch on koji) (RUNNING)
>>>> - ceilometer-distgit (rdo-liberty) change A (bump version to 8.0.1)
>>>>  - packstack job (RUNNING)
>>>>  - artifact export job (build non scratch on koji) (RUNNING)
>>>>
>>>> * The ceilometer-distgit final RPM cannot land yet in the final liberty 
>>>> RPM repo
>>>>   because the nova-distgit has not landed yet in the liberty RPM repo.
>>>>
>>>> * if "artifact export job" for nova-distgit fails:
>>>>   - change is not merged
>>>>   - RPM won't land inside the RPM
>>>> -> ceilometer-distgit jobs will restart (as a dependent change failed (all 
>>>> distgit projects share the same job))
>>>>   and the packstack job for ceilometer will be tested with nova 12.0.0 
>>>> (that is in the liberty RPM repo)
>>>>   So it will land in the RPM repo but it has been tested with the right 
>>>> version of nova RPM.
>>>>
>>>> ----> Remember with the post job to do the final build, at this time, we 
>>>> didn't know the status of the post job (fails)
>>>>   so the workflow has not figured out the export failed so ceilometer has 
>>>> been tested with nova 12.0.1 but nova 12.0.1 has not
>>>> landed ... but ceilometer landed ... :/
>>>>
>>>>
>>>> "artifact export job" uses :
>>>> https://github.com/redhat-cip/software-factory/blob/master/tools/slaves/wait_for_other_jobs.py
>>>> and an example of it (for rpmfactory):
>>>> https://github.com/redhat-cip/rpmfactory/blob/master/gating/pkg-export.sh
>>>>
>>>>
>>>> -----
>>>>
>>>> So let me know if building artifacts in the gate pipeline is relevant for 
>>>> you (at least for RPM Factory) ?
>>>>
>>> Well definitely, most RDO packages are tightly integrated and there is a
>>> non-negotiable risk of breakage.
>>>
>>>
>>>> Note in the RPM factory context we gate "a repository" ... so we use that 
>>>> repository to test other changes ...
>>>>
>>>> Now imagine if in zuul we configured the gating to not submit git changes 
>>>> ... and do it in the post pipeline ...
>>>> do you think then Zuul is still valuable ? I'm not so sure.
>>>>
>>>
>>>
>>> Alright thank you Fabien for starting this discussion.
>>> If I understand correctly the problem is really the window between a
>>> gate success and a post-job success. Any changes that enter the gate
>>> during that window won't be tested against changes that are still being
>>> post-processed... Other wise both approach seems to be identical.
>>
>> Yes this is related to delay between the change goes out from the gate 
>> pipeline
>> and the moment the post pipeline finish to exec the job. But not only
>> because when the export job is executed in the gate pipeline, the gate
>> pipeline knows the job status and then she is able to react to this failure:
>> - Git change is not merge in the git repo
>> - Following changes in the gate shared queue will stop their jobs
>>   and restart by skipping this broken change.
>> If the export job is executed in the post pipeline none of the two facts
>> above will happen ...  
>>
>>>
>>> Your initial proposition works for sure, but I question the need to have
>>> such job that will wait to be at the tip of the gate to actually
>>> SUCCESS. This seems to severely limit the capability of zuul and it's
>>> speculatively merge based design...
>>
>> I don't get that, why will it limit this capability ?
>> For me nothing change in this area but I can miss something, please clarify.
>>
> 
> I believe it limits zuul capability since the pkg_exports is serial for
> the gate. If that pkg_exports task takes 1 hour, then we can't merge
> more than 24 change per day.


You're right but I doubt the average time of building a package against
koji take so long. For a change goes out the gate pipeline we need two
"gate" conditions validated by Zuul: Gerrit merge the change in the branch
+ koji build and store the pkg. So yes it is longer than just the Gerrit merge.

> Here are the pros and cons I understood so far, please correct me if I'm
> wrong:
> 
> Wait_for_other_jobs Pros:
> * If publish fail, then the change isn't merge.
> * Gate tests are fast because they use upstream packages

* We decrease situation where a pkg tested in the gate is NOT tested
against the future state of the RPM repo 

> 
> Cons:
> * We lose the ability to merge change in parallel

Note that I don't think Zuul merges changes in parallel. Merging
change in a git repo is fast resulting to an impression of parallelism but
actually I think it is serial. IMO we lose nothing in that field.

> * This might lock the gate if all the resources are somehow allocated
>   to pkg_export tasks. (Since the job will wait for job stuck in queued)
> * It Can be a source of random failure that will affect the gate
>   (e.g. when rdo mirror timeout)
> 
> "Publish in post" Pros:
> * Similar workflow of openstack-infra for publish-to-pypi job (which is
>   a big pro imo since it works)
the big diff here is the artifact published on pypi is not re-used for the 
testing.
Future changes entering in the gate don't need the artifact on pypi to be 
validated ...
This is not the case for RDO in rpm factory.
> 
> Cons:
> * Post-job needs to be monitored when it fails.
> * Test time may be longer.
* More probable inconsistencies on the target RPM repo
* Need an additional static node to run job serially in the Post pipeline 
> 
> 
> TL;DR; As already mentioned during the last sprint review, the
> wait_for_other_jobs seems like the most trivial and effective way to
> ensure RDO repository are fully validated. However since it's a risky
> decision, I'd like we really consider other solution, at least to
> demonstrate this is the superior approach.
> 
> Perhaps we should discuss these solutions with the RDO folks too.
> Basically wait_for_other_job will guarantee that the repos are stable,
> with the cost of a special zuul gate job that may slow down development.
> Otherwise we can publish in a post job, with the risk of having desync
> between git and rpm repo.
> 

Good summary :D

> 
> Back to the original issue of not being able to build all RDO packages
> for each change, how about instead of rebuilding only what is currently
> in the gate, can't we rebuild what is also in the post pipeline ? Like
> that we don't rebuild everything each time and we still have something
> identical to what will be upstream. That may be another approach...
> (Assuming a failing post job also stays in the post pipeline until it
> succeed).

Yes I agree I guess it can be an alternative solution with the cost of:
- having a static node to maintain to run job in the post pipeline
- we won't detect that we need a rebuild if Zuul currently pass a change
  from the gate to the post pipeline (quick delay but still to consider)
- A job still need to introspect the pipelines (like wait_for_other_jobs.py)
- A change is merged in the distgit but the package is not yet in the final RPM 
repo.
-> I still prefer when both happen more *atomically* in the Gate :D

Ok if we consider this solution as you said (by assuming a failing post job 
also stays
in the post pipeline until it succeed) then I think we can have two solutions.

> Regards,
> -Tristan

_______________________________________________
Softwarefactory-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/softwarefactory-dev

Re: [Softwarefactory-dev] Artifact export for RPM factory in the Gate pipeline or the POST pipeline ?

Reply via email to