Re: [ovs-dev] [PATCH ovn v2 0/3] Stabilize CI by pinning container, runner and Python versions.

Dumitru Ceara Thu, 16 Nov 2023 01:17:16 -0800

On 11/16/23 10:02, Dumitru Ceara wrote:
> On 11/16/23 09:51, Dumitru Ceara wrote:
>> On 11/16/23 09:44, Ales Musil wrote:
>>> On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <[email protected]> wrote:
>>>
>>>> On 11/16/23 00:12, Ilya Maximets wrote:
>>>>> On 11/15/23 22:20, Mark Michelson wrote:
>>>>>> On 11/15/23 12:02, Ilya Maximets wrote:
>>>>>>> On 11/15/23 17:00, Dumitru Ceara wrote:
>>>>>>>> All of the above were changed to track the latest available releases.
>>>>>>>> Initially that seemed like a good idea but in practice, a new release
>>>> would
>>>>>>>> potentially (silently) cause CI runs that used to pass on given stable
>>>>>>>> versions to unexpectedly start failing.
>>>>>>>>
>>>>>>>> To address that this series pins all versions we can control allowing
>>>> us
>>>>>>>> to use different values for different branches.
>>>>>>>>
>>>>>>>> NOTE: the last commit of this series will look slightly different on
>>>>>>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and
>>>>>>>> Fedora 37.
>>>>>>>>
>>>>>>>> The first commit of the series bumps the OVS submodule to include the
>>>>>>>> latest fixes to errors reported by recent versions of flake8.
>>>>>>>>
>>>>>>>> Changes in v2:
>>>>>>>> - added first patch to bump OVS submodule
>>>>>>>> - addressed Ales' review comments:
>>>>>>>>    - moved the logic to determine which image to use out of ci.sh;
>>>>>>>>      it's now in the workflow itself.
>>>>>>>>    - moved setting of OVN_IMAGE in the same block with all related
>>>>>>>>      env vars
>>>>>>>> - added a note for maintainers in the release process documentation
>>>>>>>>    to mention that the new workflow will likely have to be triggered
>>>>>>>>    manually (at least once) when branching for a new release.  GitHub
>>>>>>>>    doesn't allow periodic jobs on non-default branches.
>>>>>>>
>>>>>>> IMHO, that sounds horrible to maintain.  Can we just build things per
>>>> run?
>>>>>>
>>>>>> What if we set up the "Containers" action to run on branch creation?
>>>>>> Would that adequately substitute for the manual run that creates the
>>>> image?
>>>>>>
>>>>>> If we only have to run the action once each time a branch is created, I
>>>>>> don't think that qualifies as "horrible" to maintain. This could be
>>>>>> lumped into our branch creation scripts. However, if it can be
>>>> automated
>>>>>> as I suggested above, that would be better.
>>>>>
>>>>> The job will never be re-run, meaning we'll be using the same container
>>>>> for many years.  In case we'll need an update for this container, it will
>>>>> need to be done manually for each branch, except for main.
>>>>>
>>>>> Also, in the patch, the Cirrus CI is set to use the container from main,
>>>>> so this will need to be adjusted for each branch, will need to become
>>>>> part of the release patches (?).
>>>>>
>>>>> Another issue with current implementation of containers is that it's
>>>>> a pain for external contributor to change dependencies, because they'll
>>>>> have to modify the CI in multiple places in order to build and use their
>>>>> own containers.
>>>>> And with these new changes it will become worse.  For example, I never
>>>> use
>>>>> 'main' or 'branch-*' branches in my own forks.  And CI depends on the
>>>> branch
>>>>> name now, and it will fall back to an unmaintained old-style-named image.
>>>>> Might not be easy to figure out why all your builds are failing,
>>>> especially
>>>>> if you're new to the project.
>>>>>
>>>>
>>>> We could try to make it build the container image every time on stable
>>>> branches and avoid any manual steps.
>>>>
>>>>>>
>>>>>>> How much value these containers actually have?
>>>>
>>>> I think you mentioned the benefit lower already:
>>>> - we can share the artifact (container image) with Cirrus CI; aarch64
>>>> container build takes significantly longer
>>>> - we can share the artifact with downstream CIs
>>>>
>>>>>>
>>>>>> As patch 2 mentions, it allows for us to use different distro bases for
>>>>>> each version.
>>>>>
>>>>> This can be done by simply specifying 'container: name' in the job,
>>>>> GHA will fetch and use that container.
>>>>>
>>>>>> But I also think that having the container image cached
>>>>>> allows for quicker test runs since we are not having to reconstruct the
>>>>>> environment each time we perform test runs.
>>>>>
>>>>> It takes less than 2 minutes to create a full x86 environment.  We spend
>>>>> more time re-building OVS and OVN for every variant of tests, even though
>>>>> most of them are using the same compiler flags.
>>>>>
>>>>> Anyways, there is no need to push the container into registry, we can
>>>> build
>>>>> it and pass to a next job via artifacts.  I don't think spending extra
>>>> 1-2
>>>>> minutes per workflow (not job, workflow) is a significant problem, if it
>>>>> allows to just build whatever the current dependencies are and not think
>>>>> about what kind of outdated build from which branch is going to be used.
>>>>>
>>>>> The only downside of this is that we can't share artifact with Cirrus CI
>>>>> or some downstream CIs, but that also doesn't seem to be a significant
>>>>> issue.  1-2 minutes per workflow is not that much.  Building the aarch64
>>>>> container takes a lot more time, but switching it from fedora to ubuntu
>>>>> takes only 10 minutes to build, so might be reasonable to just rebuild it
>>>>> on-demand as well.
>>>>>
>>>>
>>>> Ales, why do we need the Cirrus CI image to be Fedora based?
>>>>
>>>
>>> We don't, but it's way easier for me to navigate. I'm not even sure how/if
>>> Ubuntu one works for aarch64. We can switch that one, however I don't see
>>> any clear benefit in doing so. Also one thing to note, even if it's 1-2
>>
>> Well, it takes way less time to build, that might be an argument for it.
>>
>>> minutes (it's probably slightly more) we are losing the ability to have
>>> reproducible builds. It's very easy today to take the image and reproduce
>>> the same issue IMO.
>>>
>>
>> That's a good point, reproducible builds are important.  Storing the
>> image as GitHub CI artifact would still allow us to download it and load
>> it locally but it's way easier if it's just stored in the ghcr.io register.
>>
>>>
>>>>
>>>>> Not a strong opinion, but the branch name thing might be the most
>>>> annoying,
>>>>> if I understand correctly.
>>>>>
>>>>
>>>> On the (very) short term however, my problem is that I can't merge any
>>>> bug fix in OVN because I can't run any CI at the moment.  Mostly that's
>>>> because of different versions of OVN (and OVS via submodule) python code
>>>> that triggers different flake8 warnings with different
>>>> flake8/python/distro versions.
>>>>
>>>> A quick fix for that for now could be to bump OVS submodule again to
>>>> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch
>>>> 1/3 is doing) but on _all_ stable branches.  That's not ideal because
>>>> that commit (and its backports) are not part of any release yet.
>>>>
>>>> I wanted to avoid doing that but it seems we don't have an easy
>>>> alternative so I'm starting to think we should just go for the OVS
>>>> submodule bump and try to find a better solution for the container
>>>> workflow on the longer term.
>>>>
> 
> Just in case I wasn't clear, what I meant here is applying patch 1/3 to
> main and its corresponding stable branch backports bumping the submodule
> to the tips of OVS branch-3.2, branch-3.1 and branch-3.0.
>


.. and we need to pin Python version to 3.11 on branches <=23.03.

>>>> What do you guys think?
>>>>
>>>
>>> I'm afraid that we will constantly chase those ghosts of different
>>> dependency problems for various branches. And I'm not sure what would be
>>> the proper way to handle that. The unfortunate thing is that we don't
>>> control directly OvS so some of those "hacks" might be always necessary to
>>> have it working.
>>>
>>
>> Isn't that a different discussion?  Partially because OVN uses
>> "internal" OVS structures directly.
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2 0/3] Stabilize CI by pinning container, runner and Python versions.

Reply via email to