Re: [ovs-dev] [PATCH ovn v2 0/3] Stabilize CI by pinning container, runner and Python versions.

Dumitru Ceara Thu, 16 Nov 2023 01:03:05 -0800

On 11/16/23 09:51, Dumitru Ceara wrote:
> On 11/16/23 09:44, Ales Musil wrote:
>> On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <[email protected]> wrote:
>>
>>> On 11/16/23 00:12, Ilya Maximets wrote:
>>>> On 11/15/23 22:20, Mark Michelson wrote:
>>>>> On 11/15/23 12:02, Ilya Maximets wrote:
>>>>>> On 11/15/23 17:00, Dumitru Ceara wrote:
>>>>>>> All of the above were changed to track the latest available releases.
>>>>>>> Initially that seemed like a good idea but in practice, a new release
>>> would
>>>>>>> potentially (silently) cause CI runs that used to pass on given stable
>>>>>>> versions to unexpectedly start failing.
>>>>>>>
>>>>>>> To address that this series pins all versions we can control allowing
>>> us
>>>>>>> to use different values for different branches.
>>>>>>>
>>>>>>> NOTE: the last commit of this series will look slightly different on
>>>>>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and
>>>>>>> Fedora 37.
>>>>>>>
>>>>>>> The first commit of the series bumps the OVS submodule to include the
>>>>>>> latest fixes to errors reported by recent versions of flake8.
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>> - added first patch to bump OVS submodule
>>>>>>> - addressed Ales' review comments:
>>>>>>>    - moved the logic to determine which image to use out of ci.sh;
>>>>>>>      it's now in the workflow itself.
>>>>>>>    - moved setting of OVN_IMAGE in the same block with all related
>>>>>>>      env vars
>>>>>>> - added a note for maintainers in the release process documentation
>>>>>>>    to mention that the new workflow will likely have to be triggered
>>>>>>>    manually (at least once) when branching for a new release.  GitHub
>>>>>>>    doesn't allow periodic jobs on non-default branches.
>>>>>>
>>>>>> IMHO, that sounds horrible to maintain.  Can we just build things per
>>> run?
>>>>>
>>>>> What if we set up the "Containers" action to run on branch creation?
>>>>> Would that adequately substitute for the manual run that creates the
>>> image?
>>>>>
>>>>> If we only have to run the action once each time a branch is created, I
>>>>> don't think that qualifies as "horrible" to maintain. This could be
>>>>> lumped into our branch creation scripts. However, if it can be
>>> automated
>>>>> as I suggested above, that would be better.
>>>>
>>>> The job will never be re-run, meaning we'll be using the same container
>>>> for many years.  In case we'll need an update for this container, it will
>>>> need to be done manually for each branch, except for main.
>>>>
>>>> Also, in the patch, the Cirrus CI is set to use the container from main,
>>>> so this will need to be adjusted for each branch, will need to become
>>>> part of the release patches (?).
>>>>
>>>> Another issue with current implementation of containers is that it's
>>>> a pain for external contributor to change dependencies, because they'll
>>>> have to modify the CI in multiple places in order to build and use their
>>>> own containers.
>>>> And with these new changes it will become worse.  For example, I never
>>> use
>>>> 'main' or 'branch-*' branches in my own forks.  And CI depends on the
>>> branch
>>>> name now, and it will fall back to an unmaintained old-style-named image.
>>>> Might not be easy to figure out why all your builds are failing,
>>> especially
>>>> if you're new to the project.
>>>>
>>>
>>> We could try to make it build the container image every time on stable
>>> branches and avoid any manual steps.
>>>
>>>>>
>>>>>> How much value these containers actually have?
>>>
>>> I think you mentioned the benefit lower already:
>>> - we can share the artifact (container image) with Cirrus CI; aarch64
>>> container build takes significantly longer
>>> - we can share the artifact with downstream CIs
>>>
>>>>>
>>>>> As patch 2 mentions, it allows for us to use different distro bases for
>>>>> each version.
>>>>
>>>> This can be done by simply specifying 'container: name' in the job,
>>>> GHA will fetch and use that container.
>>>>
>>>>> But I also think that having the container image cached
>>>>> allows for quicker test runs since we are not having to reconstruct the
>>>>> environment each time we perform test runs.
>>>>
>>>> It takes less than 2 minutes to create a full x86 environment.  We spend
>>>> more time re-building OVS and OVN for every variant of tests, even though
>>>> most of them are using the same compiler flags.
>>>>
>>>> Anyways, there is no need to push the container into registry, we can
>>> build
>>>> it and pass to a next job via artifacts.  I don't think spending extra
>>> 1-2
>>>> minutes per workflow (not job, workflow) is a significant problem, if it
>>>> allows to just build whatever the current dependencies are and not think
>>>> about what kind of outdated build from which branch is going to be used.
>>>>
>>>> The only downside of this is that we can't share artifact with Cirrus CI
>>>> or some downstream CIs, but that also doesn't seem to be a significant
>>>> issue.  1-2 minutes per workflow is not that much.  Building the aarch64
>>>> container takes a lot more time, but switching it from fedora to ubuntu
>>>> takes only 10 minutes to build, so might be reasonable to just rebuild it
>>>> on-demand as well.
>>>>
>>>
>>> Ales, why do we need the Cirrus CI image to be Fedora based?
>>>
>>
>> We don't, but it's way easier for me to navigate. I'm not even sure how/if
>> Ubuntu one works for aarch64. We can switch that one, however I don't see
>> any clear benefit in doing so. Also one thing to note, even if it's 1-2
> 
> Well, it takes way less time to build, that might be an argument for it.
> 
>> minutes (it's probably slightly more) we are losing the ability to have
>> reproducible builds. It's very easy today to take the image and reproduce
>> the same issue IMO.
>>
> 
> That's a good point, reproducible builds are important.  Storing the
> image as GitHub CI artifact would still allow us to download it and load
> it locally but it's way easier if it's just stored in the ghcr.io register.
> 
>>
>>>
>>>> Not a strong opinion, but the branch name thing might be the most
>>> annoying,
>>>> if I understand correctly.
>>>>
>>>
>>> On the (very) short term however, my problem is that I can't merge any
>>> bug fix in OVN because I can't run any CI at the moment.  Mostly that's
>>> because of different versions of OVN (and OVS via submodule) python code
>>> that triggers different flake8 warnings with different
>>> flake8/python/distro versions.
>>>
>>> A quick fix for that for now could be to bump OVS submodule again to
>>> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch
>>> 1/3 is doing) but on _all_ stable branches.  That's not ideal because
>>> that commit (and its backports) are not part of any release yet.
>>>
>>> I wanted to avoid doing that but it seems we don't have an easy
>>> alternative so I'm starting to think we should just go for the OVS
>>> submodule bump and try to find a better solution for the container
>>> workflow on the longer term.
>>>


Just in case I wasn't clear, what I meant here is applying patch 1/3 to
main and its corresponding stable branch backports bumping the submodule
to the tips of OVS branch-3.2, branch-3.1 and branch-3.0.

>>> What do you guys think?
>>>
>>
>> I'm afraid that we will constantly chase those ghosts of different
>> dependency problems for various branches. And I'm not sure what would be
>> the proper way to handle that. The unfortunate thing is that we don't
>> control directly OvS so some of those "hacks" might be always necessary to
>> have it working.
>>
> 
> Isn't that a different discussion?  Partially because OVN uses
> "internal" OVS structures directly.
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2 0/3] Stabilize CI by pinning container, runner and Python versions.

Reply via email to