On 11/16/23 10:02, Dumitru Ceara wrote: > On 11/16/23 09:51, Dumitru Ceara wrote: >> On 11/16/23 09:44, Ales Musil wrote: >>> On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <[email protected]> wrote: >>> >>>> On 11/16/23 00:12, Ilya Maximets wrote: >>>>> On 11/15/23 22:20, Mark Michelson wrote: >>>>>> On 11/15/23 12:02, Ilya Maximets wrote: >>>>>>> On 11/15/23 17:00, Dumitru Ceara wrote: >>>>>>>> All of the above were changed to track the latest available releases. >>>>>>>> Initially that seemed like a good idea but in practice, a new release >>>> would >>>>>>>> potentially (silently) cause CI runs that used to pass on given stable >>>>>>>> versions to unexpectedly start failing. >>>>>>>> >>>>>>>> To address that this series pins all versions we can control allowing >>>> us >>>>>>>> to use different values for different branches. >>>>>>>> >>>>>>>> NOTE: the last commit of this series will look slightly different on >>>>>>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and >>>>>>>> Fedora 37. >>>>>>>> >>>>>>>> The first commit of the series bumps the OVS submodule to include the >>>>>>>> latest fixes to errors reported by recent versions of flake8. >>>>>>>> >>>>>>>> Changes in v2: >>>>>>>> - added first patch to bump OVS submodule >>>>>>>> - addressed Ales' review comments: >>>>>>>> - moved the logic to determine which image to use out of ci.sh; >>>>>>>> it's now in the workflow itself. >>>>>>>> - moved setting of OVN_IMAGE in the same block with all related >>>>>>>> env vars >>>>>>>> - added a note for maintainers in the release process documentation >>>>>>>> to mention that the new workflow will likely have to be triggered >>>>>>>> manually (at least once) when branching for a new release. GitHub >>>>>>>> doesn't allow periodic jobs on non-default branches. >>>>>>> >>>>>>> IMHO, that sounds horrible to maintain. Can we just build things per >>>> run? >>>>>> >>>>>> What if we set up the "Containers" action to run on branch creation? >>>>>> Would that adequately substitute for the manual run that creates the >>>> image? >>>>>> >>>>>> If we only have to run the action once each time a branch is created, I >>>>>> don't think that qualifies as "horrible" to maintain. This could be >>>>>> lumped into our branch creation scripts. However, if it can be >>>> automated >>>>>> as I suggested above, that would be better. >>>>> >>>>> The job will never be re-run, meaning we'll be using the same container >>>>> for many years. In case we'll need an update for this container, it will >>>>> need to be done manually for each branch, except for main. >>>>> >>>>> Also, in the patch, the Cirrus CI is set to use the container from main, >>>>> so this will need to be adjusted for each branch, will need to become >>>>> part of the release patches (?). >>>>> >>>>> Another issue with current implementation of containers is that it's >>>>> a pain for external contributor to change dependencies, because they'll >>>>> have to modify the CI in multiple places in order to build and use their >>>>> own containers. >>>>> And with these new changes it will become worse. For example, I never >>>> use >>>>> 'main' or 'branch-*' branches in my own forks. And CI depends on the >>>> branch >>>>> name now, and it will fall back to an unmaintained old-style-named image. >>>>> Might not be easy to figure out why all your builds are failing, >>>> especially >>>>> if you're new to the project. >>>>> >>>> >>>> We could try to make it build the container image every time on stable >>>> branches and avoid any manual steps. >>>> >>>>>> >>>>>>> How much value these containers actually have? >>>> >>>> I think you mentioned the benefit lower already: >>>> - we can share the artifact (container image) with Cirrus CI; aarch64 >>>> container build takes significantly longer >>>> - we can share the artifact with downstream CIs >>>> >>>>>> >>>>>> As patch 2 mentions, it allows for us to use different distro bases for >>>>>> each version. >>>>> >>>>> This can be done by simply specifying 'container: name' in the job, >>>>> GHA will fetch and use that container. >>>>> >>>>>> But I also think that having the container image cached >>>>>> allows for quicker test runs since we are not having to reconstruct the >>>>>> environment each time we perform test runs. >>>>> >>>>> It takes less than 2 minutes to create a full x86 environment. We spend >>>>> more time re-building OVS and OVN for every variant of tests, even though >>>>> most of them are using the same compiler flags. >>>>> >>>>> Anyways, there is no need to push the container into registry, we can >>>> build >>>>> it and pass to a next job via artifacts. I don't think spending extra >>>> 1-2 >>>>> minutes per workflow (not job, workflow) is a significant problem, if it >>>>> allows to just build whatever the current dependencies are and not think >>>>> about what kind of outdated build from which branch is going to be used. >>>>> >>>>> The only downside of this is that we can't share artifact with Cirrus CI >>>>> or some downstream CIs, but that also doesn't seem to be a significant >>>>> issue. 1-2 minutes per workflow is not that much. Building the aarch64 >>>>> container takes a lot more time, but switching it from fedora to ubuntu >>>>> takes only 10 minutes to build, so might be reasonable to just rebuild it >>>>> on-demand as well. >>>>> >>>> >>>> Ales, why do we need the Cirrus CI image to be Fedora based? >>>> >>> >>> We don't, but it's way easier for me to navigate. I'm not even sure how/if >>> Ubuntu one works for aarch64. We can switch that one, however I don't see >>> any clear benefit in doing so. Also one thing to note, even if it's 1-2 >> >> Well, it takes way less time to build, that might be an argument for it. >> >>> minutes (it's probably slightly more) we are losing the ability to have >>> reproducible builds. It's very easy today to take the image and reproduce >>> the same issue IMO. >>> >> >> That's a good point, reproducible builds are important. Storing the >> image as GitHub CI artifact would still allow us to download it and load >> it locally but it's way easier if it's just stored in the ghcr.io register. >> >>> >>>> >>>>> Not a strong opinion, but the branch name thing might be the most >>>> annoying, >>>>> if I understand correctly. >>>>> >>>> >>>> On the (very) short term however, my problem is that I can't merge any >>>> bug fix in OVN because I can't run any CI at the moment. Mostly that's >>>> because of different versions of OVN (and OVS via submodule) python code >>>> that triggers different flake8 warnings with different >>>> flake8/python/distro versions. >>>> >>>> A quick fix for that for now could be to bump OVS submodule again to >>>> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch >>>> 1/3 is doing) but on _all_ stable branches. That's not ideal because >>>> that commit (and its backports) are not part of any release yet. >>>> >>>> I wanted to avoid doing that but it seems we don't have an easy >>>> alternative so I'm starting to think we should just go for the OVS >>>> submodule bump and try to find a better solution for the container >>>> workflow on the longer term. >>>> > > Just in case I wasn't clear, what I meant here is applying patch 1/3 to > main and its corresponding stable branch backports bumping the submodule > to the tips of OVS branch-3.2, branch-3.1 and branch-3.0. >
.. and we need to pin Python version to 3.11 on branches <=23.03. >>>> What do you guys think? >>>> >>> >>> I'm afraid that we will constantly chase those ghosts of different >>> dependency problems for various branches. And I'm not sure what would be >>> the proper way to handle that. The unfortunate thing is that we don't >>> control directly OvS so some of those "hacks" might be always necessary to >>> have it working. >>> >> >> Isn't that a different discussion? Partially because OVN uses >> "internal" OVS structures directly. >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
